mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-16 06:05:01 +08:00
docs: fixes distributed executor backend config for multi-node vllm (#29173)
Signed-off-by: Michael Act <michael.a.c.tulenan@gdplabs.id> Co-authored-by: Michael Goin <mgoin64@gmail.com>
This commit is contained in:
parent
5f96c00c55
commit
3ed767ec06
@ -118,14 +118,16 @@ The common practice is to set the tensor parallel size to the number of GPUs in
|
|||||||
```bash
|
```bash
|
||||||
vllm serve /path/to/the/model/in/the/container \
|
vllm serve /path/to/the/model/in/the/container \
|
||||||
--tensor-parallel-size 8 \
|
--tensor-parallel-size 8 \
|
||||||
--pipeline-parallel-size 2
|
--pipeline-parallel-size 2 \
|
||||||
|
--distributed-executor-backend ray
|
||||||
```
|
```
|
||||||
|
|
||||||
Alternatively, you can set `tensor_parallel_size` to the total number of GPUs in the cluster:
|
Alternatively, you can set `tensor_parallel_size` to the total number of GPUs in the cluster:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
vllm serve /path/to/the/model/in/the/container \
|
vllm serve /path/to/the/model/in/the/container \
|
||||||
--tensor-parallel-size 16
|
--tensor-parallel-size 16 \
|
||||||
|
--distributed-executor-backend ray
|
||||||
```
|
```
|
||||||
|
|
||||||
## Optimizing network communication for tensor parallelism
|
## Optimizing network communication for tensor parallelism
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user