[Doc] Improve installation signposting (#12575)

- Make device tab names more explicit
- Add comprehensive list of devices to
https://docs.vllm.ai/en/latest/getting_started/installation/index.html
- Add `attention` blocks to the intro of all devices that don't have
pre-built wheels/images

---------

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
This commit is contained in:
Harry Mellor 2025-01-31 23:38:35 +00:00 committed by GitHub
parent fc542144c4
commit 60808bd4c7
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
13 changed files with 111 additions and 59 deletions

View File

@ -2,6 +2,10 @@
This tab provides instructions on running vLLM with Intel Gaudi devices. This tab provides instructions on running vLLM with Intel Gaudi devices.
:::{attention}
There are no pre-built wheels or images for this device, so you must build vLLM from source.
:::
## Requirements ## Requirements
- OS: Ubuntu 22.04 LTS - OS: Ubuntu 22.04 LTS

View File

@ -5,7 +5,8 @@ vLLM is a Python library that supports the following AI accelerators. Select you
:::::{tab-set} :::::{tab-set}
:sync-group: device :sync-group: device
::::{tab-item} TPU ::::{tab-item} Google TPU
:selected:
:sync: tpu :sync: tpu
:::{include} tpu.inc.md :::{include} tpu.inc.md
@ -25,7 +26,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you
:::: ::::
::::{tab-item} Neuron ::::{tab-item} AWS Neuron
:sync: neuron :sync: neuron
:::{include} neuron.inc.md :::{include} neuron.inc.md
@ -52,7 +53,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you
:::::{tab-set} :::::{tab-set}
:sync-group: device :sync-group: device
::::{tab-item} TPU ::::{tab-item} Google TPU
:sync: tpu :sync: tpu
:::{include} tpu.inc.md :::{include} tpu.inc.md
@ -72,7 +73,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you
:::: ::::
::::{tab-item} Neuron ::::{tab-item} AWS Neuron
:sync: neuron :sync: neuron
:::{include} neuron.inc.md :::{include} neuron.inc.md
@ -99,7 +100,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you
:::::{tab-set} :::::{tab-set}
:sync-group: device :sync-group: device
::::{tab-item} TPU ::::{tab-item} Google TPU
:sync: tpu :sync: tpu
:::{include} tpu.inc.md :::{include} tpu.inc.md
@ -119,7 +120,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you
:::: ::::
::::{tab-item} Neuron ::::{tab-item} AWS Neuron
:sync: neuron :sync: neuron
:::{include} neuron.inc.md :::{include} neuron.inc.md
@ -146,7 +147,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you
:::::{tab-set} :::::{tab-set}
:sync-group: device :sync-group: device
::::{tab-item} TPU ::::{tab-item} Google TPU
:sync: tpu :sync: tpu
:::{include} tpu.inc.md :::{include} tpu.inc.md
@ -166,7 +167,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you
:::: ::::
::::{tab-item} Neuron ::::{tab-item} AWS Neuron
:sync: neuron :sync: neuron
:::{include} neuron.inc.md :::{include} neuron.inc.md
@ -193,7 +194,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you
:::::{tab-set} :::::{tab-set}
:sync-group: device :sync-group: device
::::{tab-item} TPU ::::{tab-item} Google TPU
:sync: tpu :sync: tpu
:::{include} tpu.inc.md :::{include} tpu.inc.md
@ -213,7 +214,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you
:::: ::::
::::{tab-item} Neuron ::::{tab-item} AWS Neuron
:sync: neuron :sync: neuron
:::{include} neuron.inc.md :::{include} neuron.inc.md
@ -242,7 +243,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you
:::::{tab-set} :::::{tab-set}
:sync-group: device :sync-group: device
::::{tab-item} TPU ::::{tab-item} Google TPU
:sync: tpu :sync: tpu
:::{include} tpu.inc.md :::{include} tpu.inc.md
@ -262,7 +263,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you
:::: ::::
::::{tab-item} Neuron ::::{tab-item} AWS Neuron
:sync: neuron :sync: neuron
:::{include} neuron.inc.md :::{include} neuron.inc.md
@ -289,7 +290,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you
:::::{tab-set} :::::{tab-set}
:sync-group: device :sync-group: device
::::{tab-item} TPU ::::{tab-item} Google TPU
:sync: tpu :sync: tpu
:::{include} tpu.inc.md :::{include} tpu.inc.md
@ -309,7 +310,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you
:::: ::::
::::{tab-item} Neuron ::::{tab-item} AWS Neuron
:sync: neuron :sync: neuron
:::{include} neuron.inc.md :::{include} neuron.inc.md
@ -336,7 +337,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you
:::::{tab-set} :::::{tab-set}
:sync-group: device :sync-group: device
::::{tab-item} TPU ::::{tab-item} Google TPU
:sync: tpu :sync: tpu
:::{include} tpu.inc.md :::{include} tpu.inc.md
@ -354,7 +355,7 @@ vLLM is a Python library that supports the following AI accelerators. Select you
:::: ::::
::::{tab-item} Neuron ::::{tab-item} AWS Neuron
:sync: neuron :sync: neuron
:::{include} neuron.inc.md :::{include} neuron.inc.md

View File

@ -4,6 +4,10 @@ vLLM 0.3.3 onwards supports model inferencing and serving on AWS Trainium/Infere
Paged Attention and Chunked Prefill are currently in development and will be available soon. Paged Attention and Chunked Prefill are currently in development and will be available soon.
Data types currently supported in Neuron SDK are FP16 and BF16. Data types currently supported in Neuron SDK are FP16 and BF16.
:::{attention}
There are no pre-built wheels or images for this device, so you must build vLLM from source.
:::
## Requirements ## Requirements
- OS: Linux - OS: Linux

View File

@ -2,6 +2,10 @@
vLLM powered by OpenVINO supports all LLM models from [vLLM supported models list](#supported-models) and can perform optimal model serving on all x86-64 CPUs with, at least, AVX2 support, as well as on both integrated and discrete Intel® GPUs ([the list of supported GPUs](https://docs.openvino.ai/2024/about-openvino/release-notes-openvino/system-requirements.html#gpu)). vLLM powered by OpenVINO supports all LLM models from [vLLM supported models list](#supported-models) and can perform optimal model serving on all x86-64 CPUs with, at least, AVX2 support, as well as on both integrated and discrete Intel® GPUs ([the list of supported GPUs](https://docs.openvino.ai/2024/about-openvino/release-notes-openvino/system-requirements.html#gpu)).
:::{attention}
There are no pre-built wheels or images for this device, so you must build vLLM from source.
:::
## Requirements ## Requirements
- OS: Linux - OS: Linux

View File

@ -30,6 +30,10 @@ For TPU pricing information, see [Cloud TPU pricing](https://cloud.google.com/tp
You may need additional persistent storage for your TPU VMs. For more You may need additional persistent storage for your TPU VMs. For more
information, see [Storage options for Cloud TPU data](https://cloud.devsite.corp.google.com/tpu/docs/storage-options). information, see [Storage options for Cloud TPU data](https://cloud.devsite.corp.google.com/tpu/docs/storage-options).
:::{attention}
There are no pre-built wheels for this device, so you must either use the pre-built Docker image or build vLLM from source.
:::
## Requirements ## Requirements
- Google Cloud TPU VM - Google Cloud TPU VM

View File

@ -4,6 +4,10 @@ vLLM has experimental support for macOS with Apple silicon. For now, users shall
Currently the CPU implementation for macOS supports FP32 and FP16 datatypes. Currently the CPU implementation for macOS supports FP32 and FP16 datatypes.
:::{attention}
There are no pre-built wheels or images for this device, so you must build vLLM from source.
:::
## Requirements ## Requirements
- OS: `macOS Sonoma` or later - OS: `macOS Sonoma` or later

View File

@ -4,6 +4,10 @@ vLLM has been adapted to work on ARM64 CPUs with NEON support, leveraging the CP
ARM CPU backend currently supports Float32, FP16 and BFloat16 datatypes. ARM CPU backend currently supports Float32, FP16 and BFloat16 datatypes.
:::{attention}
There are no pre-built wheels or images for this device, so you must build vLLM from source.
:::
## Requirements ## Requirements
- OS: Linux - OS: Linux

View File

@ -5,7 +5,8 @@ vLLM is a Python library that supports the following CPU variants. Select your C
:::::{tab-set} :::::{tab-set}
:sync-group: device :sync-group: device
::::{tab-item} x86 ::::{tab-item} Intel/AMD x86
:selected:
:sync: x86 :sync: x86
:::{include} x86.inc.md :::{include} x86.inc.md
@ -15,7 +16,7 @@ vLLM is a Python library that supports the following CPU variants. Select your C
:::: ::::
::::{tab-item} ARM ::::{tab-item} ARM AArch64
:sync: arm :sync: arm
:::{include} arm.inc.md :::{include} arm.inc.md
@ -44,7 +45,7 @@ vLLM is a Python library that supports the following CPU variants. Select your C
:::::{tab-set} :::::{tab-set}
:sync-group: device :sync-group: device
::::{tab-item} x86 ::::{tab-item} Intel/AMD x86
:sync: x86 :sync: x86
:::{include} x86.inc.md :::{include} x86.inc.md
@ -54,7 +55,7 @@ vLLM is a Python library that supports the following CPU variants. Select your C
:::: ::::
::::{tab-item} ARM ::::{tab-item} ARM AArch64
:sync: arm :sync: arm
:::{include} arm.inc.md :::{include} arm.inc.md
@ -92,7 +93,7 @@ Currently, there are no pre-built CPU wheels.
:::::{tab-set} :::::{tab-set}
:sync-group: device :sync-group: device
::::{tab-item} x86 ::::{tab-item} Intel/AMD x86
:sync: x86 :sync: x86
:::{include} x86.inc.md :::{include} x86.inc.md
@ -102,7 +103,7 @@ Currently, there are no pre-built CPU wheels.
:::: ::::
::::{tab-item} ARM ::::{tab-item} ARM AArch64
:sync: arm :sync: arm
:::{include} arm.inc.md :::{include} arm.inc.md

View File

@ -2,12 +2,20 @@
vLLM initially supports basic model inferencing and serving on x86 CPU platform, with data types FP32, FP16 and BF16. vLLM initially supports basic model inferencing and serving on x86 CPU platform, with data types FP32, FP16 and BF16.
:::{attention}
There are no pre-built wheels or images for this device, so you must build vLLM from source.
:::
## Requirements ## Requirements
- OS: Linux - OS: Linux
- Compiler: `gcc/g++ >= 12.3.0` (optional, recommended) - Compiler: `gcc/g++ >= 12.3.0` (optional, recommended)
- Instruction Set Architecture (ISA): AVX512 (optional, recommended) - Instruction Set Architecture (ISA): AVX512 (optional, recommended)
:::{tip}
[Intel Extension for PyTorch (IPEX)](https://github.com/intel/intel-extension-for-pytorch) extends PyTorch with up-to-date features optimizations for an extra performance boost on Intel hardware.
:::
## Set up using Python ## Set up using Python
### Pre-built wheels ### Pre-built wheels
@ -29,7 +37,3 @@ vLLM initially supports basic model inferencing and serving on x86 CPU platform,
### Build image from source ### Build image from source
## Extra information ## Extra information
## Intel Extension for PyTorch
- [Intel Extension for PyTorch (IPEX)](https://github.com/intel/intel-extension-for-pytorch) extends PyTorch with up-to-date features optimizations for an extra performance boost on Intel hardware.

View File

@ -5,7 +5,8 @@ vLLM is a Python library that supports the following GPU variants. Select your G
:::::{tab-set} :::::{tab-set}
:sync-group: device :sync-group: device
::::{tab-item} CUDA ::::{tab-item} NVIDIA CUDA
:selected:
:sync: cuda :sync: cuda
:::{include} cuda.inc.md :::{include} cuda.inc.md
@ -15,7 +16,7 @@ vLLM is a Python library that supports the following GPU variants. Select your G
:::: ::::
::::{tab-item} ROCm ::::{tab-item} AMD ROCm
:sync: rocm :sync: rocm
:::{include} rocm.inc.md :::{include} rocm.inc.md
@ -25,7 +26,7 @@ vLLM is a Python library that supports the following GPU variants. Select your G
:::: ::::
::::{tab-item} XPU ::::{tab-item} Intel XPU
:sync: xpu :sync: xpu
:::{include} xpu.inc.md :::{include} xpu.inc.md
@ -45,7 +46,7 @@ vLLM is a Python library that supports the following GPU variants. Select your G
:::::{tab-set} :::::{tab-set}
:sync-group: device :sync-group: device
::::{tab-item} CUDA ::::{tab-item} NVIDIA CUDA
:sync: cuda :sync: cuda
:::{include} cuda.inc.md :::{include} cuda.inc.md
@ -55,7 +56,7 @@ vLLM is a Python library that supports the following GPU variants. Select your G
:::: ::::
::::{tab-item} ROCm ::::{tab-item} AMD ROCm
:sync: rocm :sync: rocm
:::{include} rocm.inc.md :::{include} rocm.inc.md
@ -65,7 +66,7 @@ vLLM is a Python library that supports the following GPU variants. Select your G
:::: ::::
::::{tab-item} XPU ::::{tab-item} Intel XPU
:sync: xpu :sync: xpu
:::{include} xpu.inc.md :::{include} xpu.inc.md
@ -87,7 +88,7 @@ vLLM is a Python library that supports the following GPU variants. Select your G
:::::{tab-set} :::::{tab-set}
:sync-group: device :sync-group: device
::::{tab-item} CUDA ::::{tab-item} NVIDIA CUDA
:sync: cuda :sync: cuda
:::{include} cuda.inc.md :::{include} cuda.inc.md
@ -97,14 +98,14 @@ vLLM is a Python library that supports the following GPU variants. Select your G
:::: ::::
::::{tab-item} ROCm ::::{tab-item} AMD ROCm
:sync: rocm :sync: rocm
There is no extra information on creating a new Python environment for this device. There is no extra information on creating a new Python environment for this device.
:::: ::::
::::{tab-item} XPU ::::{tab-item} Intel XPU
:sync: xpu :sync: xpu
There is no extra information on creating a new Python environment for this device. There is no extra information on creating a new Python environment for this device.
@ -118,7 +119,7 @@ There is no extra information on creating a new Python environment for this devi
:::::{tab-set} :::::{tab-set}
:sync-group: device :sync-group: device
::::{tab-item} CUDA ::::{tab-item} NVIDIA CUDA
:sync: cuda :sync: cuda
:::{include} cuda.inc.md :::{include} cuda.inc.md
@ -128,7 +129,7 @@ There is no extra information on creating a new Python environment for this devi
:::: ::::
::::{tab-item} ROCm ::::{tab-item} AMD ROCm
:sync: rocm :sync: rocm
:::{include} rocm.inc.md :::{include} rocm.inc.md
@ -138,7 +139,7 @@ There is no extra information on creating a new Python environment for this devi
:::: ::::
::::{tab-item} XPU ::::{tab-item} Intel XPU
:sync: xpu :sync: xpu
:::{include} xpu.inc.md :::{include} xpu.inc.md
@ -157,7 +158,7 @@ There is no extra information on creating a new Python environment for this devi
:::::{tab-set} :::::{tab-set}
:sync-group: device :sync-group: device
::::{tab-item} CUDA ::::{tab-item} NVIDIA CUDA
:sync: cuda :sync: cuda
:::{include} cuda.inc.md :::{include} cuda.inc.md
@ -167,7 +168,7 @@ There is no extra information on creating a new Python environment for this devi
:::: ::::
::::{tab-item} ROCm ::::{tab-item} AMD ROCm
:sync: rocm :sync: rocm
:::{include} rocm.inc.md :::{include} rocm.inc.md
@ -177,7 +178,7 @@ There is no extra information on creating a new Python environment for this devi
:::: ::::
::::{tab-item} XPU ::::{tab-item} Intel XPU
:sync: xpu :sync: xpu
:::{include} xpu.inc.md :::{include} xpu.inc.md
@ -196,7 +197,7 @@ There is no extra information on creating a new Python environment for this devi
:::::{tab-set} :::::{tab-set}
:sync-group: device :sync-group: device
::::{tab-item} CUDA ::::{tab-item} NVIDIA CUDA
:sync: cuda :sync: cuda
:::{include} cuda.inc.md :::{include} cuda.inc.md
@ -206,7 +207,7 @@ There is no extra information on creating a new Python environment for this devi
:::: ::::
::::{tab-item} ROCm ::::{tab-item} AMD ROCm
:sync: rocm :sync: rocm
:::{include} rocm.inc.md :::{include} rocm.inc.md
@ -216,7 +217,7 @@ There is no extra information on creating a new Python environment for this devi
:::: ::::
::::{tab-item} XPU ::::{tab-item} Intel XPU
:sync: xpu :sync: xpu
:::{include} xpu.inc.md :::{include} xpu.inc.md
@ -233,7 +234,7 @@ There is no extra information on creating a new Python environment for this devi
:::::{tab-set} :::::{tab-set}
:sync-group: device :sync-group: device
::::{tab-item} CUDA ::::{tab-item} NVIDIA CUDA
:sync: cuda :sync: cuda
:::{include} cuda.inc.md :::{include} cuda.inc.md
@ -243,7 +244,7 @@ There is no extra information on creating a new Python environment for this devi
:::: ::::
::::{tab-item} ROCm ::::{tab-item} AMD ROCm
:sync: rocm :sync: rocm
:::{include} rocm.inc.md :::{include} rocm.inc.md
@ -253,7 +254,7 @@ There is no extra information on creating a new Python environment for this devi
:::: ::::
::::{tab-item} XPU ::::{tab-item} Intel XPU
:sync: xpu :sync: xpu
:::{include} xpu.inc.md :::{include} xpu.inc.md
@ -270,7 +271,7 @@ There is no extra information on creating a new Python environment for this devi
:::::{tab-set} :::::{tab-set}
:sync-group: device :sync-group: device
::::{tab-item} CUDA ::::{tab-item} NVIDIA CUDA
:sync: cuda :sync: cuda
:::{include} cuda.inc.md :::{include} cuda.inc.md
@ -279,7 +280,7 @@ There is no extra information on creating a new Python environment for this devi
:::: ::::
::::{tab-item} ROCm ::::{tab-item} AMD ROCm
:sync: rocm :sync: rocm
:::{include} rocm.inc.md :::{include} rocm.inc.md
@ -288,7 +289,7 @@ There is no extra information on creating a new Python environment for this devi
:::: ::::
::::{tab-item} XPU ::::{tab-item} Intel XPU
:sync: xpu :sync: xpu
:::{include} xpu.inc.md :::{include} xpu.inc.md

View File

@ -2,6 +2,10 @@
vLLM supports AMD GPUs with ROCm 6.2. vLLM supports AMD GPUs with ROCm 6.2.
:::{attention}
There are no pre-built wheels for this device, so you must either use the pre-built Docker image or build vLLM from source.
:::
## Requirements ## Requirements
- GPU: MI200s (gfx90a), MI300 (gfx942), Radeon RX 7900 series (gfx1100) - GPU: MI200s (gfx90a), MI300 (gfx942), Radeon RX 7900 series (gfx1100)
@ -13,14 +17,6 @@ vLLM supports AMD GPUs with ROCm 6.2.
Currently, there are no pre-built ROCm wheels. Currently, there are no pre-built ROCm wheels.
However, the [AMD Infinity hub for vLLM](https://hub.docker.com/r/rocm/vllm/tags) offers a prebuilt, optimized
docker image designed for validating inference performance on the AMD Instinct™ MI300X accelerator.
:::{tip}
Please check [LLM inference performance validation on AMD Instinct MI300X](https://rocm.docs.amd.com/en/latest/how-to/performance-validation/mi300x/vllm-benchmark.html)
for instructions on how to use this prebuilt docker image.
:::
### Build wheel from source ### Build wheel from source
0. Install prerequisites (skip if you are already in an environment/docker with the following installed): 0. Install prerequisites (skip if you are already in an environment/docker with the following installed):
@ -112,7 +108,13 @@ for instructions on how to use this prebuilt docker image.
### Pre-built images ### Pre-built images
Currently, there are no pre-built ROCm images. The [AMD Infinity hub for vLLM](https://hub.docker.com/r/rocm/vllm/tags) offers a prebuilt, optimized
docker image designed for validating inference performance on the AMD Instinct™ MI300X accelerator.
:::{tip}
Please check [LLM inference performance validation on AMD Instinct MI300X](https://rocm.docs.amd.com/en/latest/how-to/performance-validation/mi300x/vllm-benchmark.html)
for instructions on how to use this prebuilt docker image.
:::
### Build image from source ### Build image from source

View File

@ -2,6 +2,10 @@
vLLM initially supports basic model inferencing and serving on Intel GPU platform. vLLM initially supports basic model inferencing and serving on Intel GPU platform.
:::{attention}
There are no pre-built wheels or images for this device, so you must build vLLM from source.
:::
## Requirements ## Requirements
- Supported Hardware: Intel Data Center GPU, Intel ARC GPU - Supported Hardware: Intel Data Center GPU, Intel ARC GPU

View File

@ -6,8 +6,23 @@ vLLM supports the following hardware platforms:
:::{toctree} :::{toctree}
:maxdepth: 1 :maxdepth: 1
:hidden:
gpu/index gpu/index
cpu/index cpu/index
ai_accelerator/index ai_accelerator/index
::: :::
- <project:gpu/index.md>
- NVIDIA CUDA
- AMD ROCm
- Intel XPU
- <project:cpu/index.md>
- Intel/AMD x86
- ARM AArch64
- Apple silicon
- <project:ai_accelerator/index.md>
- Google TPU
- Intel Gaudi
- AWS Neuron
- OpenVINO