From 9ccc6ded425139b386edb6a268af66ede2082beb Mon Sep 17 00:00:00 2001 From: Reid <61492567+reidliu41@users.noreply.github.com> Date: Wed, 14 May 2025 18:57:34 +0800 Subject: [PATCH] [doc] add missing import (#18133) Signed-off-by: reidliu41 Co-authored-by: reidliu41 --- docs/source/serving/offline_inference.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/source/serving/offline_inference.md b/docs/source/serving/offline_inference.md index e46361955c73..433d2e894dd8 100644 --- a/docs/source/serving/offline_inference.md +++ b/docs/source/serving/offline_inference.md @@ -74,6 +74,8 @@ Tensor parallelism (`tensor_parallel_size` option) can be used to split the mode The following code splits the model across 2 GPUs. ```python +from vllm import LLM + llm = LLM(model="ibm-granite/granite-3.1-8b-instruct", tensor_parallel_size=2) ```