From 90a2769f2030bd11299f83871d7bc1c06db88cfb Mon Sep 17 00:00:00 2001
From: Ricardo Decal <crypdick@users.noreply.github.com>
Date: Mon, 7 Jul 2025 20:08:05 -0700
Subject: [PATCH] [Docs] Add Ray Serve LLM section to openai compatible server
 guide (#20595)

Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
---
 docs/serving/openai_compatible_server.md | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/docs/serving/openai_compatible_server.md b/docs/serving/openai_compatible_server.md
index ffb58d9f60009..82195ae82f153 100644
--- a/docs/serving/openai_compatible_server.md
+++ b/docs/serving/openai_compatible_server.md
@@ -775,3 +775,17 @@ The following extra parameters are supported:
 ```python
 --8<-- "vllm/entrypoints/openai/protocol.py:rerank-extra-params"
 ```
+
+## Ray Serve LLM
+
+Ray Serve LLM enables scalable, production-grade serving of the vLLM engine. It integrates tightly with vLLM and extends it with features such as auto-scaling, load balancing, and back-pressure.
+
+Key capabilities:
+
+- Exposes an OpenAI-compatible HTTP API as well as a Pythonic API.
+- Scales from a single GPU to a multi-node cluster without code changes.
+- Provides observability and autoscaling policies through Ray dashboards and metrics.
+
+The following example shows how to deploy a large model like DeepSeek R1 with Ray Serve LLM: <gh-file:examples/online_serving/ray_serve_deepseek.py>.
+
+Learn more about Ray Serve LLM with the official [Ray Serve LLM documentation](https://docs.ray.io/en/latest/serve/llm/serving-llms.html).