From efe73d0575951767180468dac8202739cb479074 Mon Sep 17 00:00:00 2001 From: Reid <61492567+reidliu41@users.noreply.github.com> Date: Wed, 9 Jul 2025 23:08:19 +0800 Subject: [PATCH] [doc] update doc format (#20673) Signed-off-by: reidliu41 --- .../contributing/ci/update_pytorch_version.md | 78 ++++++++++++------- 1 file changed, 51 insertions(+), 27 deletions(-) diff --git a/docs/contributing/ci/update_pytorch_version.md b/docs/contributing/ci/update_pytorch_version.md index 2327bc4b53ad2..1fe18d5d88565 100644 --- a/docs/contributing/ci/update_pytorch_version.md +++ b/docs/contributing/ci/update_pytorch_version.md @@ -16,11 +16,12 @@ by waiting for the next release or by implementing hacky workarounds in vLLM. The better solution is to test vLLM with PyTorch release candidates (RC) to ensure compatibility before each release. -PyTorch release candidates can be downloaded from PyTorch test index at https://download.pytorch.org/whl/test. -For example, torch2.7.0+cu12.8 RC can be installed using the following command: +PyTorch release candidates can be downloaded from [PyTorch test index](https://download.pytorch.org/whl/test). +For example, `torch2.7.0+cu12.8` RC can be installed using the following command: -``` -uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu128 +```bash +uv pip install torch torchvision torchaudio \ + --index-url https://download.pytorch.org/whl/test/cu128 ``` When the final RC is ready for testing, it will be announced to the community @@ -28,13 +29,28 @@ on the [PyTorch dev-discuss forum](https://dev-discuss.pytorch.org/c/release-ann After this announcement, we can begin testing vLLM integration by drafting a pull request following this 3-step process: -1. Update requirements files in https://github.com/vllm-project/vllm/tree/main/requirements -to point to the new releases for torch, torchvision, and torchaudio. -2. Use `--extra-index-url https://download.pytorch.org/whl/test/` to -get the final release candidates' wheels. Some common platforms are `cpu`, `cu128`, -and `rocm6.2.4`. -3. As vLLM uses uv, make sure that `unsafe-best-match` strategy is set either -via `UV_INDEX_STRATEGY` env variable or via `--index-strategy unsafe-best-match`. +1. Update [requirements files](https://github.com/vllm-project/vllm/tree/main/requirements) +to point to the new releases for `torch`, `torchvision`, and `torchaudio`. + +2. Use the following option to get the final release candidates' wheels. Some common platforms are `cpu`, `cu128`, and `rocm6.2.4`. + + ```bash + --extra-index-url https://download.pytorch.org/whl/test/ + ``` + +3. Since vLLM uses `uv`, ensure the following index strategy is applied: + + - Via environment variable: + + ```bash + export UV_INDEX_STRATEGY=unsafe-best-match + ``` + + - Or via CLI flag: + + ```bash + --index-strategy unsafe-best-match + ``` If failures are found in the pull request, raise them as issues on vLLM and cc the PyTorch release team to initiate discussion on how to address them. @@ -42,20 +58,25 @@ cc the PyTorch release team to initiate discussion on how to address them. ## Update CUDA version The PyTorch release matrix includes both stable and experimental [CUDA versions](https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-compatibility-matrix). Due to limitations, only the latest stable CUDA version (for example, -torch2.7.0+cu12.6) is uploaded to PyPI. However, vLLM may require a different CUDA version, +`torch2.7.0+cu12.6`) is uploaded to PyPI. However, vLLM may require a different CUDA version, such as 12.8 for Blackwell support. This complicates the process as we cannot use the out-of-the-box `pip install torch torchvision torchaudio` command. The solution is to use `--extra-index-url` in vLLM's Dockerfiles. -1. Use `--extra-index-url https://download.pytorch.org/whl/cu128` to install torch+cu128. -2. Other important indexes at the moment include: - 1. CPU ‒ https://download.pytorch.org/whl/cpu - 2. ROCm ‒ https://download.pytorch.org/whl/rocm6.2.4 and https://download.pytorch.org/whl/rocm6.3 - 3. XPU ‒ https://download.pytorch.org/whl/xpu -3. Update .buildkite/release-pipeline.yaml and .buildkite/scripts/upload-wheels.sh to -match the CUDA version from step 1. This makes sure that the release vLLM wheel is tested -on CI. +- Important indexes at the moment include: + +| Platform | `--extra-index-url` | +|----------|-----------------| +| CUDA 12.8| [https://download.pytorch.org/whl/cu128](https://download.pytorch.org/whl/cu128)| +| CPU | [https://download.pytorch.org/whl/cpu](https://download.pytorch.org/whl/cpu)| +| ROCm 6.2 | [https://download.pytorch.org/whl/rocm6.2.4](https://download.pytorch.org/whl/rocm6.2.4) | +| ROCm 6.3 | [https://download.pytorch.org/whl/rocm6.3](https://download.pytorch.org/whl/rocm6.3) | +| XPU | [https://download.pytorch.org/whl/xpu](https://download.pytorch.org/whl/xpu) | + +- Update the below files to match the CUDA version from step 1. This makes sure that the release vLLM wheel is tested on CI. + - `.buildkite/release-pipeline.yaml` + - `.buildkite/scripts/upload-wheels.sh` ## Address long vLLM build time @@ -66,7 +87,7 @@ it doesn't populate the cache, so re-running it to warm up the cache is ineffective. While ongoing efforts like [#17419](gh-issue:17419) -address the long build time at its source, the current workaround is to set VLLM_CI_BRANCH +address the long build time at its source, the current workaround is to set `VLLM_CI_BRANCH` to a custom branch provided by @khluu (`VLLM_CI_BRANCH=khluu/use_postmerge_q`) when manually triggering a build on Buildkite. This branch accomplishes two things: @@ -86,17 +107,18 @@ releases (which would take too much time), they can be built from source to unblock the update process. ### FlashInfer -Here is how to build and install it from source with torch2.7.0+cu128 in vLLM [Dockerfile](https://github.com/vllm-project/vllm/blob/27bebcd89792d5c4b08af7a65095759526f2f9e1/docker/Dockerfile#L259-L271): +Here is how to build and install it from source with `torch2.7.0+cu128` in vLLM [Dockerfile](https://github.com/vllm-project/vllm/blob/27bebcd89792d5c4b08af7a65095759526f2f9e1/docker/Dockerfile#L259-L271): ```bash export TORCH_CUDA_ARCH_LIST='7.5 8.0 8.9 9.0 10.0+PTX' export FLASHINFER_ENABLE_SM90=1 -uv pip install --system --no-build-isolation "git+https://github.com/flashinfer-ai/flashinfer@v0.2.6.post1" +uv pip install --system \ + --no-build-isolation "git+https://github.com/flashinfer-ai/flashinfer@v0.2.6.post1" ``` One caveat is that building FlashInfer from source adds approximately 30 minutes to the vLLM build time. Therefore, it's preferable to cache the wheel in a -public location for immediate installation, such as https://download.pytorch.org/whl/cu128/flashinfer/flashinfer_python-0.2.6.post1%2Bcu128torch2.7-cp39-abi3-linux_x86_64.whl. For future releases, contact the PyTorch release +public location for immediate installation, such as [this FlashInfer wheel link](https://download.pytorch.org/whl/cu128/flashinfer/flashinfer_python-0.2.6.post1%2Bcu128torch2.7-cp39-abi3-linux_x86_64.whl). For future releases, contact the PyTorch release team if you want to get the package published there. ### xFormers @@ -104,13 +126,15 @@ Similar to FlashInfer, here is how to build and install xFormers from source: ```bash export TORCH_CUDA_ARCH_LIST='7.0 7.5 8.0 8.9 9.0 10.0+PTX' -MAX_JOBS=16 uv pip install --system --no-build-isolation "git+https://github.com/facebookresearch/xformers@v0.0.30" +MAX_JOBS=16 uv pip install --system \ + --no-build-isolation "git+https://github.com/facebookresearch/xformers@v0.0.30" ``` ### Mamba ```bash -uv pip install --system --no-build-isolation "git+https://github.com/state-spaces/mamba@v2.2.4" +uv pip install --system \ + --no-build-isolation "git+https://github.com/state-spaces/mamba@v2.2.4" ``` ### causal-conv1d @@ -125,6 +149,6 @@ Rather than attempting to update all vLLM platforms in a single pull request, it to handle some platforms separately. The separation of requirements and Dockerfiles for different platforms in vLLM CI/CD allows us to selectively choose which platforms to update. For instance, updating XPU requires the corresponding -release from https://github.com/intel/intel-extension-for-pytorch by Intel. +release from [Intel Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch) by Intel. While updated vLLM to PyTorch 2.7.0 on CPU, CUDA, and ROCm, completed the update for XPU.