From 90a2769f2030bd11299f83871d7bc1c06db88cfb Mon Sep 17 00:00:00 2001 From: Ricardo Decal Date: Mon, 7 Jul 2025 20:08:05 -0700 Subject: [PATCH] [Docs] Add Ray Serve LLM section to openai compatible server guide (#20595) Signed-off-by: Ricardo Decal --- docs/serving/openai_compatible_server.md | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/docs/serving/openai_compatible_server.md b/docs/serving/openai_compatible_server.md index ffb58d9f60009..82195ae82f153 100644 --- a/docs/serving/openai_compatible_server.md +++ b/docs/serving/openai_compatible_server.md @@ -775,3 +775,17 @@ The following extra parameters are supported: ```python --8<-- "vllm/entrypoints/openai/protocol.py:rerank-extra-params" ``` + +## Ray Serve LLM + +Ray Serve LLM enables scalable, production-grade serving of the vLLM engine. It integrates tightly with vLLM and extends it with features such as auto-scaling, load balancing, and back-pressure. + +Key capabilities: + +- Exposes an OpenAI-compatible HTTP API as well as a Pythonic API. +- Scales from a single GPU to a multi-node cluster without code changes. +- Provides observability and autoscaling policies through Ray dashboards and metrics. + +The following example shows how to deploy a large model like DeepSeek R1 with Ray Serve LLM: . + +Learn more about Ray Serve LLM with the official [Ray Serve LLM documentation](https://docs.ray.io/en/latest/serve/llm/serving-llms.html).