[Docs] Expand introduction to Ray in Multi-node deployment section (#21584)

Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
2026-03-18 14:47:17 +08:00 · 2025-07-29 22:07:28 -07:00 · 2025-07-29 22:07:28 -07:00 · 4cd7fe6cea
commit 4cd7fe6cea
parent 16f3250527
1 changed files with 11 additions and 1 deletions
--- a/docs/serving/distributed_serving.md
+++ b/docs/serving/distributed_serving.md
@ -58,7 +58,17 @@ vllm serve gpt2 \

 ## Multi-node deployment

-If a single node lacks sufficient GPUs to hold the model, deploy vLLM across multiple nodes. Multi-node deployments require Ray as the runtime engine. Ensure that every node provides an identical execution environment, including the model path and Python packages. Using container images is recommended because they provide a convenient way to keep environments consistent and to hide host heterogeneity.
+If a single node lacks sufficient GPUs to hold the model, deploy vLLM across multiple nodes. Ensure that every node provides an identical execution environment, including the model path and Python packages. Using container images is recommended because they provide a convenient way to keep environments consistent and to hide host heterogeneity.
+
+### What is Ray?
+
+Ray is a distributed computing framework for scaling Python programs. Multi-node vLLM deployments require Ray as the runtime engine.
+
+vLLM uses Ray to manage the distributed execution of tasks across multiple nodes and control where execution happens.
+
+Ray also offers high-level APIs for large-scale [offline batch inference](https://docs.ray.io/en/latest/data/working-with-llms.html) and [online serving](https://docs.ray.io/en/latest/serve/llm/serving-llms.html) that can leverage vLLM as the engine. These APIs add production-grade fault tolerance, scaling, and distributed observability to vLLM workloads.
+
+For details, see the [Ray documentation](https://docs.ray.io/en/latest/index.html).

 ### Ray cluster setup with containers