From 0736f901e7e39d477c8ee00c177f2d67dae48078 Mon Sep 17 00:00:00 2001 From: Yuan Tang Date: Tue, 23 Dec 2025 15:27:22 -0500 Subject: [PATCH] docs: Add llm-d integration to the website (#31234) Signed-off-by: Yuan Tang --- docs/deployment/integrations/kserve.md | 2 +- docs/deployment/integrations/llm-d.md | 5 +++++ docs/deployment/k8s.md | 1 + 3 files changed, 7 insertions(+), 1 deletion(-) create mode 100644 docs/deployment/integrations/llm-d.md diff --git a/docs/deployment/integrations/kserve.md b/docs/deployment/integrations/kserve.md index 37b29aa1a4876..06ad5f29a1a65 100644 --- a/docs/deployment/integrations/kserve.md +++ b/docs/deployment/integrations/kserve.md @@ -2,4 +2,4 @@ vLLM can be deployed with [KServe](https://github.com/kserve/kserve) on Kubernetes for highly scalable distributed model serving. -Please see [this guide](https://kserve.github.io/website/docs/model-serving/generative-inference/overview) for more details on using vLLM with KServe. +You can use vLLM with KServe's [Hugging Face serving runtime](https://kserve.github.io/website/docs/model-serving/generative-inference/overview) or via [`LLMInferenceService` that uses llm-d](https://kserve.github.io/website/docs/model-serving/generative-inference/llmisvc/llmisvc-overview). diff --git a/docs/deployment/integrations/llm-d.md b/docs/deployment/integrations/llm-d.md new file mode 100644 index 0000000000000..cccf1773c6be6 --- /dev/null +++ b/docs/deployment/integrations/llm-d.md @@ -0,0 +1,5 @@ +# llm-d + +vLLM can be deployed with [llm-d](https://github.com/llm-d/llm-d), a Kubernetes-native distributed inference serving stack providing well-lit paths for anyone to serve large generative AI models at scale. It helps achieve the fastest "time to state-of-the-art (SOTA) performance" for key OSS models across most hardware accelerators and infrastructure providers. + +You can use vLLM with llm-d directly by following [this guide](https://llm-d.ai/docs/guide) or via [KServe's LLMInferenceService](https://kserve.github.io/website/docs/model-serving/generative-inference/llmisvc/llmisvc-overview). diff --git a/docs/deployment/k8s.md b/docs/deployment/k8s.md index 05814cbad9bfc..77a159009aa8d 100644 --- a/docs/deployment/k8s.md +++ b/docs/deployment/k8s.md @@ -12,6 +12,7 @@ Alternatively, you can deploy vLLM to Kubernetes using any of the following: - [Helm](frameworks/helm.md) - [InftyAI/llmaz](integrations/llmaz.md) +- [llm-d](integrations/llm-d.md) - [KAITO](integrations/kaito.md) - [KServe](integrations/kserve.md) - [Kthena](integrations/kthena.md)