diff --git a/docs/source/serving/deploying_with_k8s.rst b/docs/source/serving/deploying_with_k8s.rst index 7dc076dc709df..cc3606f0df851 100644 --- a/docs/source/serving/deploying_with_k8s.rst +++ b/docs/source/serving/deploying_with_k8s.rst @@ -162,7 +162,7 @@ To test the deployment, run the following ``curl`` command: curl http://mistral-7b.default.svc.cluster.local/v1/completions \ -H "Content-Type: application/json" \ -d '{ - "model": "facebook/opt-125m", + "model": "mistralai/Mistral-7B-Instruct-v0.3", "prompt": "San Francisco is a", "max_tokens": 7, "temperature": 0 @@ -172,4 +172,4 @@ If the service is correctly deployed, you should receive a response from the vLL Conclusion ---------- -Deploying vLLM with Kubernetes allows for efficient scaling and management of ML models leveraging GPU resources. By following the steps outlined above, you should be able to set up and test a vLLM deployment within your Kubernetes cluster. If you encounter any issues or have suggestions, please feel free to contribute to the documentation. \ No newline at end of file +Deploying vLLM with Kubernetes allows for efficient scaling and management of ML models leveraging GPU resources. By following the steps outlined above, you should be able to set up and test a vLLM deployment within your Kubernetes cluster. If you encounter any issues or have suggestions, please feel free to contribute to the documentation.