Update deploying_with_k8s.rst (#10922)

This commit is contained in:
AlexHe99 2024-12-16 08:33:58 +08:00 committed by GitHub
parent 25ebed2f8c
commit da6f409246
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -162,7 +162,7 @@ To test the deployment, run the following ``curl`` command:
curl http://mistral-7b.default.svc.cluster.local/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "facebook/opt-125m",
"model": "mistralai/Mistral-7B-Instruct-v0.3",
"prompt": "San Francisco is a",
"max_tokens": 7,
"temperature": 0
@ -172,4 +172,4 @@ If the service is correctly deployed, you should receive a response from the vLL
Conclusion
----------
Deploying vLLM with Kubernetes allows for efficient scaling and management of ML models leveraging GPU resources. By following the steps outlined above, you should be able to set up and test a vLLM deployment within your Kubernetes cluster. If you encounter any issues or have suggestions, please feel free to contribute to the documentation.
Deploying vLLM with Kubernetes allows for efficient scaling and management of ML models leveraging GPU resources. By following the steps outlined above, you should be able to set up and test a vLLM deployment within your Kubernetes cluster. If you encounter any issues or have suggestions, please feel free to contribute to the documentation.