From 95863540538804a48e31ca5806cefbac9d4c5326 Mon Sep 17 00:00:00 2001 From: Michael Goin Date: Mon, 22 Dec 2025 15:06:29 -0500 Subject: [PATCH] [Doc] Add vllm-metal to hardware plugin documentation (#31174) Signed-off-by: mgoin --- docs/getting_started/installation/README.md | 1 + docs/getting_started/installation/cpu.apple.inc.md | 3 +++ 2 files changed, 4 insertions(+) diff --git a/docs/getting_started/installation/README.md b/docs/getting_started/installation/README.md index 9b93a6b9ac12c..cdbe601ca801a 100644 --- a/docs/getting_started/installation/README.md +++ b/docs/getting_started/installation/README.md @@ -28,3 +28,4 @@ The backends below live **outside** the main `vllm` repository and follow the | Cambricon MLU | `vllm-mlu` | | | Baidu Kunlun XPU | N/A, install from source | | | Sophgo TPU | N/A, install from source | | +| Apple Silicon (Metal) | N/A, install from source | | diff --git a/docs/getting_started/installation/cpu.apple.inc.md b/docs/getting_started/installation/cpu.apple.inc.md index 9f1f6e3821397..c5a4d00ddcf4c 100644 --- a/docs/getting_started/installation/cpu.apple.inc.md +++ b/docs/getting_started/installation/cpu.apple.inc.md @@ -4,6 +4,9 @@ vLLM has experimental support for macOS with Apple Silicon. For now, users must Currently the CPU implementation for macOS supports FP32 and FP16 datatypes. +!!! tip "GPU-Accelerated Inference with vLLM-Metal" + For GPU-accelerated inference on Apple Silicon using Metal, check out [vllm-metal](https://github.com/vllm-project/vllm-metal), a community-maintained hardware plugin that uses MLX as the compute backend. + # --8<-- [end:installation] # --8<-- [start:requirements]