From 3ed767ec064fbebbf5d8de829d390fa4a1bf0a0b Mon Sep 17 00:00:00 2001
From: Michael Act <michael.a.c.tulenan@gdplabs.id>
Date: Sun, 23 Nov 2025 09:58:28 +0700
Subject: [PATCH] docs: fixes distributed executor backend config for
 multi-node vllm (#29173)

Signed-off-by: Michael Act <michael.a.c.tulenan@gdplabs.id>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
---
 docs/serving/parallelism_scaling.md | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/docs/serving/parallelism_scaling.md b/docs/serving/parallelism_scaling.md
index 14cd3b057791c..a32840ea73b9a 100644
--- a/docs/serving/parallelism_scaling.md
+++ b/docs/serving/parallelism_scaling.md
@@ -118,14 +118,16 @@ The common practice is to set the tensor parallel size to the number of GPUs in
 ```bash
 vllm serve /path/to/the/model/in/the/container \
     --tensor-parallel-size 8 \
-    --pipeline-parallel-size 2
+    --pipeline-parallel-size 2 \
+    --distributed-executor-backend ray
 ```
 
 Alternatively, you can set `tensor_parallel_size` to the total number of GPUs in the cluster:
 
 ```bash
 vllm serve /path/to/the/model/in/the/container \
-     --tensor-parallel-size 16
+     --tensor-parallel-size 16 \
+     --distributed-executor-backend ray
 ```
 
 ## Optimizing network communication for tensor parallelism