From 054c8657e30518af0aab10366f66f03287e45eff Mon Sep 17 00:00:00 2001
From: Ricardo Decal <crypdick@users.noreply.github.com>
Date: Mon, 14 Jul 2025 23:13:55 -0400
Subject: [PATCH] [Docs] Add Kuberay to deployment integrations (#20592)

Signed-off-by: Ricardo Decal <rdecal@anyscale.com>
---
 docs/deployment/integrations/kuberay.md | 20 ++++++++++++++++++++
 docs/deployment/k8s.md                  |  1 +
 2 files changed, 21 insertions(+)
 create mode 100644 docs/deployment/integrations/kuberay.md

diff --git a/docs/deployment/integrations/kuberay.md b/docs/deployment/integrations/kuberay.md
new file mode 100644
index 000000000000..1dcc98024e8d
--- /dev/null
+++ b/docs/deployment/integrations/kuberay.md
@@ -0,0 +1,20 @@
+# KubeRay
+
+[KubeRay](https://github.com/ray-project/kuberay) provides a Kubernetes-native way to run vLLM workloads on Ray clusters.
+A Ray cluster can be declared in YAML, and the operator then handles pod scheduling, networking configuration, restarts, and blue-green deployments — all while preserving the familiar Kubernetes experience.
+
+## Why KubeRay instead of manual scripts?
+
+| Feature | Manual scripts | KubeRay |
+|---------|-----------------------------------------------------------|---------|
+| Cluster bootstrap | Manually SSH into every node and run a script | One command to create or update the whole cluster: `kubectl apply -f cluster.yaml` |
+| Autoscaling | Manual | Automatically patches CRDs for adjusting cluster size |
+| Upgrades | Tear down & re-create manually | Blue/green deployment updates supported |
+| Declarative config | Bash flags & environment variables | Git-ops-friendly YAML CRDs (RayCluster/RayService) |
+
+Using KubeRay reduces the operational burden and simplifies integration of Ray + vLLM with existing Kubernetes workflows (CI/CD, secrets, storage classes, etc.).
+
+## Learn more
+
+* ["Serve a Large Language Model using Ray Serve LLM on Kubernetes"](https://docs.ray.io/en/master/cluster/kubernetes/examples/rayserve-llm-example.html) - An end-to-end example of how to serve a model using vLLM, KubeRay, and Ray Serve.
+* [KubeRay documentation](https://docs.ray.io/en/latest/cluster/kubernetes/index.html)
diff --git a/docs/deployment/k8s.md b/docs/deployment/k8s.md
index 8eb2270ab7c8..f244b0858eb6 100644
--- a/docs/deployment/k8s.md
+++ b/docs/deployment/k8s.md
@@ -13,6 +13,7 @@ Alternatively, you can deploy vLLM to Kubernetes using any of the following:
 - [Helm](frameworks/helm.md)
 - [InftyAI/llmaz](integrations/llmaz.md)
 - [KServe](integrations/kserve.md)
+- [KubeRay](integrations/kuberay.md)
 - [kubernetes-sigs/lws](frameworks/lws.md)
 - [meta-llama/llama-stack](integrations/llamastack.md)
 - [substratusai/kubeai](integrations/kubeai.md)