mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-26 09:29:37 +08:00

docs: Add llm-d integration to the website (#31234 )

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>

2025-12-23 20:27:22 +00:00

603 B

Raw Blame History

llm-d

vLLM can be deployed with llm-d, a Kubernetes-native distributed inference serving stack providing well-lit paths for anyone to serve large generative AI models at scale. It helps achieve the fastest "time to state-of-the-art (SOTA) performance" for key OSS models across most hardware accelerators and infrastructure providers.

You can use vLLM with llm-d directly by following this guide or via KServe's LLMInferenceService.

603 B Raw Blame History

llm-d

603 B

Raw Blame History