mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-09 13:15:48 +08:00
[Docs] Expand introduction to Ray in Multi-node deployment section (#21584)
Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
This commit is contained in:
parent
16f3250527
commit
4cd7fe6cea
@ -58,7 +58,17 @@ vllm serve gpt2 \
|
||||
|
||||
## Multi-node deployment
|
||||
|
||||
If a single node lacks sufficient GPUs to hold the model, deploy vLLM across multiple nodes. Multi-node deployments require Ray as the runtime engine. Ensure that every node provides an identical execution environment, including the model path and Python packages. Using container images is recommended because they provide a convenient way to keep environments consistent and to hide host heterogeneity.
|
||||
If a single node lacks sufficient GPUs to hold the model, deploy vLLM across multiple nodes. Ensure that every node provides an identical execution environment, including the model path and Python packages. Using container images is recommended because they provide a convenient way to keep environments consistent and to hide host heterogeneity.
|
||||
|
||||
### What is Ray?
|
||||
|
||||
Ray is a distributed computing framework for scaling Python programs. Multi-node vLLM deployments require Ray as the runtime engine.
|
||||
|
||||
vLLM uses Ray to manage the distributed execution of tasks across multiple nodes and control where execution happens.
|
||||
|
||||
Ray also offers high-level APIs for large-scale [offline batch inference](https://docs.ray.io/en/latest/data/working-with-llms.html) and [online serving](https://docs.ray.io/en/latest/serve/llm/serving-llms.html) that can leverage vLLM as the engine. These APIs add production-grade fault tolerance, scaling, and distributed observability to vLLM workloads.
|
||||
|
||||
For details, see the [Ray documentation](https://docs.ray.io/en/latest/index.html).
|
||||
|
||||
### Ray cluster setup with containers
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user