mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-10 03:15:20 +08:00
[ROCm] [Doc] Update ROCm installation docs (#27327)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
This commit is contained in:
parent
61fbfe5274
commit
5b3c35a68e
@ -1,6 +1,6 @@
|
||||
# --8<-- [start:installation]
|
||||
|
||||
vLLM supports AMD GPUs with ROCm 6.3 or above.
|
||||
vLLM supports AMD GPUs with ROCm 6.3 or above, and torch 2.8.0 and above.
|
||||
|
||||
!!! tip
|
||||
[Docker](#set-up-using-docker) is the recommended way to use vLLM on ROCm.
|
||||
@ -28,57 +28,63 @@ Currently, there are no pre-built ROCm wheels.
|
||||
# --8<-- [end:pre-built-wheels]
|
||||
# --8<-- [start:build-wheel-from-source]
|
||||
|
||||
!!! tip
|
||||
- If you found that the following installation step does not work for you, please refer to [docker/Dockerfile.rocm_base](https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile.rocm_base). Dockerfile is a form of installation steps.
|
||||
|
||||
0. Install prerequisites (skip if you are already in an environment/docker with the following installed):
|
||||
|
||||
- [ROCm](https://rocm.docs.amd.com/en/latest/deploy/linux/index.html)
|
||||
- [PyTorch](https://pytorch.org/)
|
||||
|
||||
For installing PyTorch, you can start from a fresh docker image, e.g, `rocm/pytorch:rocm6.4.3_ubuntu24.04_py3.12_pytorch_release_2.6.0`, `rocm/pytorch-nightly`. If you are using docker image, you can skip to Step 3.
|
||||
For installing PyTorch, you can start from a fresh docker image, e.g, `rocm/pytorch:rocm7.0_ubuntu22.04_py3.10_pytorch_release_2.8.0`, `rocm/pytorch-nightly`. If you are using docker image, you can skip to Step 3.
|
||||
|
||||
Alternatively, you can install PyTorch using PyTorch wheels. You can check PyTorch installation guide in PyTorch [Getting Started](https://pytorch.org/get-started/locally/). Example:
|
||||
|
||||
```bash
|
||||
# Install PyTorch
|
||||
pip uninstall torch -y
|
||||
pip install --no-cache-dir torch torchvision --index-url https://download.pytorch.org/whl/rocm6.4
|
||||
pip install --no-cache-dir torch torchvision --index-url https://download.pytorch.org/whl/nightly/rocm7.0
|
||||
```
|
||||
|
||||
1. Install [Triton for ROCm](https://github.com/triton-lang/triton)
|
||||
1. Install [Triton for ROCm](https://github.com/ROCm/triton.git)
|
||||
|
||||
Install ROCm's Triton (the default triton-mlir branch) following the instructions from [ROCm/triton](https://github.com/ROCm/triton/blob/triton-mlir/README.md)
|
||||
Install ROCm's Triton following the instructions from [ROCm/triton](https://github.com/ROCm/triton.git)
|
||||
|
||||
```bash
|
||||
python3 -m pip install ninja cmake wheel pybind11
|
||||
pip uninstall -y triton
|
||||
git clone https://github.com/triton-lang/triton.git
|
||||
git clone https://github.com/ROCm/triton.git
|
||||
cd triton
|
||||
git checkout e5be006
|
||||
# git checkout $TRITON_BRANCH
|
||||
git checkout f9e5bf54
|
||||
if [ ! -f setup.py ]; then cd python; fi
|
||||
python3 setup.py install
|
||||
cd ../..
|
||||
```
|
||||
|
||||
!!! note
|
||||
If you see HTTP issue related to downloading packages during building triton, please try again as the HTTP error is intermittent.
|
||||
- The validated `$TRITON_BRANCH` can be found in the [docker/Dockerfile.rocm_base](https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile.rocm_base).
|
||||
- If you see HTTP issue related to downloading packages during building triton, please try again as the HTTP error is intermittent.
|
||||
|
||||
2. Optionally, if you choose to use CK flash attention, you can install [flash attention for ROCm](https://github.com/Dao-AILab/flash-attention)
|
||||
2. Optionally, if you choose to use CK flash attention, you can install [flash attention for ROCm](https://github.com/Dao-AILab/flash-attention.git)
|
||||
|
||||
Install ROCm's flash attention (v2.7.2) following the instructions from [ROCm/flash-attention](https://github.com/ROCm/flash-attention#amd-rocm-support)
|
||||
Alternatively, wheels intended for vLLM use can be accessed under the releases.
|
||||
Install ROCm's flash attention (v2.8.0) following the instructions from [ROCm/flash-attention](https://github.com/Dao-AILab/flash-attention#amd-rocm-support)
|
||||
|
||||
For example, for ROCm 6.3, suppose your gfx arch is `gfx90a`. To get your gfx architecture, run `rocminfo |grep gfx`.
|
||||
For example, for ROCm 7.0, suppose your gfx arch is `gfx942`. To get your gfx architecture, run `rocminfo |grep gfx`.
|
||||
|
||||
```bash
|
||||
git clone https://github.com/Dao-AILab/flash-attention.git
|
||||
cd flash-attention
|
||||
git checkout 1a7f4dfa
|
||||
# git checkout $FA_BRANCH
|
||||
git checkout 0e60e394
|
||||
git submodule update --init
|
||||
GPU_ARCHS="gfx90a" python3 setup.py install
|
||||
GPU_ARCHS="gfx942" python3 setup.py install
|
||||
cd ..
|
||||
```
|
||||
|
||||
!!! note
|
||||
You might need to downgrade the "ninja" version to 1.10 as it is not used when compiling flash-attention-2 (e.g. `pip install ninja==1.10.2.4`)
|
||||
- The validated `$FA_BRANCH` can be found in the [docker/Dockerfile.rocm_base](https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile.rocm_base).
|
||||
|
||||
|
||||
3. If you choose to build AITER yourself to use a certain branch or commit, you can build AITER using the following steps:
|
||||
|
||||
@ -92,11 +98,13 @@ Currently, there are no pre-built ROCm wheels.
|
||||
```
|
||||
|
||||
!!! note
|
||||
You will need to config the `$AITER_BRANCH_OR_COMMIT` for your purpose.
|
||||
- You will need to config the `$AITER_BRANCH_OR_COMMIT` for your purpose.
|
||||
- The validated `$AITER_BRANCH_OR_COMMIT` can be found in the [docker/Dockerfile.rocm_base](https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile.rocm_base).
|
||||
|
||||
4. Build vLLM. For example, vLLM on ROCM 6.3 can be built with the following steps:
|
||||
|
||||
??? console "Commands"
|
||||
4. Build vLLM. For example, vLLM on ROCM 7.0 can be built with the following steps:
|
||||
|
||||
???+ console "Commands"
|
||||
|
||||
```bash
|
||||
pip install --upgrade pip
|
||||
@ -109,31 +117,48 @@ Currently, there are no pre-built ROCm wheels.
|
||||
scipy \
|
||||
huggingface-hub[cli,hf_transfer] \
|
||||
setuptools_scm
|
||||
pip install "numpy<2"
|
||||
pip install -r requirements/rocm.txt
|
||||
|
||||
# Build vLLM for MI210/MI250/MI300.
|
||||
export PYTORCH_ROCM_ARCH="gfx90a;gfx942"
|
||||
# To build for a single architecture (e.g., MI300) for faster installation (recommended):
|
||||
export PYTORCH_ROCM_ARCH="gfx942"
|
||||
|
||||
# To build vLLM for multiple arch MI210/MI250/MI300, use this instead
|
||||
# export PYTORCH_ROCM_ARCH="gfx90a;gfx942"
|
||||
|
||||
python3 setup.py develop
|
||||
```
|
||||
|
||||
This may take 5-10 minutes. Currently, `pip install .` does not work for ROCm installation.
|
||||
|
||||
!!! tip
|
||||
- Triton flash attention is used by default. For benchmarking purposes, it is recommended to run a warm-up step before collecting perf numbers.
|
||||
- Triton flash attention does not currently support sliding window attention. If using half precision, please use CK flash-attention for sliding window support.
|
||||
- To use CK flash-attention or PyTorch naive attention, please use this flag `export VLLM_USE_TRITON_FLASH_ATTN=0` to turn off triton flash attention.
|
||||
- The ROCm version of PyTorch, ideally, should match the ROCm driver version.
|
||||
|
||||
!!! tip
|
||||
- For MI300x (gfx942) users, to achieve optimal performance, please refer to [MI300x tuning guide](https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/index.html) for performance optimization and tuning tips on system and workflow level.
|
||||
For vLLM, please refer to [vLLM performance optimization](https://rocm.docs.amd.com/en/latest/how-to/tuning-guides/mi300x/workload.html#vllm-performance-optimization).
|
||||
For vLLM, please refer to [vLLM performance optimization](https://rocm.docs.amd.com/en/latest/how-to/rocm-for-ai/inference-optimization/vllm-optimization.html).
|
||||
|
||||
# --8<-- [end:build-wheel-from-source]
|
||||
# --8<-- [start:pre-built-images]
|
||||
|
||||
The [AMD Infinity hub for vLLM](https://hub.docker.com/r/rocm/vllm/tags) offers a prebuilt, optimized
|
||||
docker image designed for validating inference performance on the AMD Instinct™ MI300X accelerator.
|
||||
AMD also offers nightly prebuilt docker image from [Docker Hub](https://hub.docker.com/r/rocm/vllm-dev), which has vLLM and all its dependencies installed.
|
||||
|
||||
???+ console "Commands"
|
||||
```bash
|
||||
docker pull rocm/vllm-dev:nightly # to get the latest image
|
||||
docker run -it --rm \
|
||||
--network=host \
|
||||
--group-add=video \
|
||||
--ipc=host \
|
||||
--cap-add=SYS_PTRACE \
|
||||
--security-opt seccomp=unconfined \
|
||||
--device /dev/kfd \
|
||||
--device /dev/dri \
|
||||
-v <path/to/your/models>:/app/models \
|
||||
-e HF_HOME="/app/models" \
|
||||
rocm/vllm-dev:nightly
|
||||
```
|
||||
|
||||
!!! tip
|
||||
Please check [LLM inference performance validation on AMD Instinct MI300X](https://rocm.docs.amd.com/en/latest/how-to/performance-validation/mi300x/vllm-benchmark.html)
|
||||
@ -144,7 +169,7 @@ docker image designed for validating inference performance on the AMD Instinct
|
||||
|
||||
Building the Docker image from source is the recommended way to use vLLM with ROCm.
|
||||
|
||||
#### (Optional) Build an image with ROCm software stack
|
||||
??? info "(Optional) Build an image with ROCm software stack"
|
||||
|
||||
Build a docker image from [docker/Dockerfile.rocm_base](https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile.rocm_base) which setup ROCm software stack needed by the vLLM.
|
||||
**This step is optional as this rocm_base image is usually prebuilt and store at [Docker Hub](https://hub.docker.com/r/rocm/vllm-dev) under tag `rocm/vllm-dev:base` to speed up user experience.**
|
||||
@ -160,7 +185,7 @@ It is important that the user kicks off the docker build using buildkit. Either
|
||||
}
|
||||
```
|
||||
|
||||
To build vllm on ROCm 6.3 for MI200 and MI300 series, you can use the default:
|
||||
To build vllm on ROCm 7.0 for MI200 and MI300 series, you can use the default:
|
||||
|
||||
```bash
|
||||
DOCKER_BUILDKIT=1 docker build \
|
||||
@ -181,7 +206,7 @@ It is important that the user kicks off the docker build using buildkit. Either
|
||||
}
|
||||
```
|
||||
|
||||
[docker/Dockerfile.rocm](https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile.rocm) uses ROCm 6.3 by default, but also supports ROCm 5.7, 6.0, 6.1, and 6.2, in older vLLM branches.
|
||||
[docker/Dockerfile.rocm](https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile.rocm) uses ROCm 7.0 by default, but also supports ROCm 5.7, 6.0, 6.1, 6.2, 6.3, and 6.4, in older vLLM branches.
|
||||
It provides flexibility to customize the build of docker image using the following arguments:
|
||||
|
||||
- `BASE_IMAGE`: specifies the base image used when running `docker build`. The default value `rocm/vllm-dev:base` is an image published and maintained by AMD. It is being built using [docker/Dockerfile.rocm_base](https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile.rocm_base)
|
||||
@ -189,16 +214,16 @@ It provides flexibility to customize the build of docker image using the followi
|
||||
|
||||
Their values can be passed in when running `docker build` with `--build-arg` options.
|
||||
|
||||
To build vllm on ROCm 6.3 for MI200 and MI300 series, you can use the default:
|
||||
To build vllm on ROCm 7.0 for MI200 and MI300 series, you can use the default:
|
||||
|
||||
???+ console "Commands"
|
||||
```bash
|
||||
DOCKER_BUILDKIT=1 docker build -f docker/Dockerfile.rocm -t vllm-rocm .
|
||||
```
|
||||
|
||||
To run the above docker image `vllm-rocm`, use the below command:
|
||||
|
||||
??? console "Command"
|
||||
|
||||
???+ console "Commands"
|
||||
```bash
|
||||
docker run -it \
|
||||
--network=host \
|
||||
|
||||
@ -1,4 +1,4 @@
|
||||
It's recommended to use [uv](https://docs.astral.sh/uv/), a very fast Python environment manager, to create and manage Python environments. Please follow the [documentation](https://docs.astral.sh/uv/#getting-started) to install `uv`. After installing `uv`, you can create a new Python environment using the following commands:
|
||||
On NVIDIA CUDA only, it's recommended to use [uv](https://docs.astral.sh/uv/), a very fast Python environment manager, to create and manage Python environments. Please follow the [documentation](https://docs.astral.sh/uv/#getting-started) to install `uv`. After installing `uv`, you can create a new Python environment using the following commands:
|
||||
|
||||
```bash
|
||||
uv venv --python 3.12 --seed
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user