diff --git a/docs/source/serving/offline_inference.md b/docs/source/serving/offline_inference.md index e46361955c73..433d2e894dd8 100644 --- a/docs/source/serving/offline_inference.md +++ b/docs/source/serving/offline_inference.md @@ -74,6 +74,8 @@ Tensor parallelism (`tensor_parallel_size` option) can be used to split the mode The following code splits the model across 2 GPUs. ```python +from vllm import LLM + llm = LLM(model="ibm-granite/granite-3.1-8b-instruct", tensor_parallel_size=2) ```