mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-11 09:06:01 +08:00
[Docs] Update PP docs (#6598)
This commit is contained in:
parent
4cc24f01b1
commit
45ceb85a0c
@ -44,11 +44,10 @@ You can also additionally specify :code:`--pipeline-parallel-size` to enable pip
|
|||||||
|
|
||||||
$ vllm serve gpt2 \
|
$ vllm serve gpt2 \
|
||||||
$ --tensor-parallel-size 4 \
|
$ --tensor-parallel-size 4 \
|
||||||
$ --pipeline-parallel-size 2 \
|
$ --pipeline-parallel-size 2
|
||||||
$ --distributed-executor-backend ray
|
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
Pipeline parallel is a beta feature. It is only supported for online serving and the ray backend for now, as well as LLaMa and GPT2 style models.
|
Pipeline parallel is a beta feature. It is only supported for online serving as well as LLaMa, GPT2, and Mixtral style models.
|
||||||
|
|
||||||
To scale vLLM beyond a single machine, install and start a `Ray runtime <https://docs.ray.io/en/latest/ray-core/starting-ray.html>`_ via CLI before running vLLM:
|
To scale vLLM beyond a single machine, install and start a `Ray runtime <https://docs.ray.io/en/latest/ray-core/starting-ray.html>`_ via CLI before running vLLM:
|
||||||
|
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user