docs: fixes distributed executor backend config for multi-node vllm (#29173)

Signed-off-by: Michael Act <michael.a.c.tulenan@gdplabs.id>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
This commit is contained in:
Michael Act 2025-11-23 09:58:28 +07:00 committed by GitHub
parent 5f96c00c55
commit 3ed767ec06
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -118,14 +118,16 @@ The common practice is to set the tensor parallel size to the number of GPUs in
```bash
vllm serve /path/to/the/model/in/the/container \
--tensor-parallel-size 8 \
--pipeline-parallel-size 2
--pipeline-parallel-size 2 \
--distributed-executor-backend ray
```
Alternatively, you can set `tensor_parallel_size` to the total number of GPUs in the cluster:
```bash
vllm serve /path/to/the/model/in/the/container \
--tensor-parallel-size 16
--tensor-parallel-size 16 \
--distributed-executor-backend ray
```
## Optimizing network communication for tensor parallelism