From 3ed767ec064fbebbf5d8de829d390fa4a1bf0a0b Mon Sep 17 00:00:00 2001 From: Michael Act Date: Sun, 23 Nov 2025 09:58:28 +0700 Subject: [PATCH] docs: fixes distributed executor backend config for multi-node vllm (#29173) Signed-off-by: Michael Act Co-authored-by: Michael Goin --- docs/serving/parallelism_scaling.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/serving/parallelism_scaling.md b/docs/serving/parallelism_scaling.md index 14cd3b057791c..a32840ea73b9a 100644 --- a/docs/serving/parallelism_scaling.md +++ b/docs/serving/parallelism_scaling.md @@ -118,14 +118,16 @@ The common practice is to set the tensor parallel size to the number of GPUs in ```bash vllm serve /path/to/the/model/in/the/container \ --tensor-parallel-size 8 \ - --pipeline-parallel-size 2 + --pipeline-parallel-size 2 \ + --distributed-executor-backend ray ``` Alternatively, you can set `tensor_parallel_size` to the total number of GPUs in the cluster: ```bash vllm serve /path/to/the/model/in/the/container \ - --tensor-parallel-size 16 + --tensor-parallel-size 16 \ + --distributed-executor-backend ray ``` ## Optimizing network communication for tensor parallelism