[doc] update doc format (#20673)

Signed-off-by: reidliu41 <reid201711@gmail.com>
This commit is contained in:
Reid 2025-07-09 23:08:19 +08:00 committed by GitHub
parent 853487bc1b
commit efe73d0575
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -16,11 +16,12 @@ by waiting for the next release or by implementing hacky workarounds in vLLM.
The better solution is to test vLLM with PyTorch release candidates (RC) to ensure
compatibility before each release.
PyTorch release candidates can be downloaded from PyTorch test index at https://download.pytorch.org/whl/test.
For example, torch2.7.0+cu12.8 RC can be installed using the following command:
PyTorch release candidates can be downloaded from [PyTorch test index](https://download.pytorch.org/whl/test).
For example, `torch2.7.0+cu12.8` RC can be installed using the following command:
```
uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/test/cu128
```bash
uv pip install torch torchvision torchaudio \
--index-url https://download.pytorch.org/whl/test/cu128
```
When the final RC is ready for testing, it will be announced to the community
@ -28,13 +29,28 @@ on the [PyTorch dev-discuss forum](https://dev-discuss.pytorch.org/c/release-ann
After this announcement, we can begin testing vLLM integration by drafting a pull request
following this 3-step process:
1. Update requirements files in https://github.com/vllm-project/vllm/tree/main/requirements
to point to the new releases for torch, torchvision, and torchaudio.
2. Use `--extra-index-url https://download.pytorch.org/whl/test/<PLATFORM>` to
get the final release candidates' wheels. Some common platforms are `cpu`, `cu128`,
and `rocm6.2.4`.
3. As vLLM uses uv, make sure that `unsafe-best-match` strategy is set either
via `UV_INDEX_STRATEGY` env variable or via `--index-strategy unsafe-best-match`.
1. Update [requirements files](https://github.com/vllm-project/vllm/tree/main/requirements)
to point to the new releases for `torch`, `torchvision`, and `torchaudio`.
2. Use the following option to get the final release candidates' wheels. Some common platforms are `cpu`, `cu128`, and `rocm6.2.4`.
```bash
--extra-index-url https://download.pytorch.org/whl/test/<PLATFORM>
```
3. Since vLLM uses `uv`, ensure the following index strategy is applied:
- Via environment variable:
```bash
export UV_INDEX_STRATEGY=unsafe-best-match
```
- Or via CLI flag:
```bash
--index-strategy unsafe-best-match
```
If failures are found in the pull request, raise them as issues on vLLM and
cc the PyTorch release team to initiate discussion on how to address them.
@ -42,20 +58,25 @@ cc the PyTorch release team to initiate discussion on how to address them.
## Update CUDA version
The PyTorch release matrix includes both stable and experimental [CUDA versions](https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-compatibility-matrix). Due to limitations, only the latest stable CUDA version (for example,
torch2.7.0+cu12.6) is uploaded to PyPI. However, vLLM may require a different CUDA version,
`torch2.7.0+cu12.6`) is uploaded to PyPI. However, vLLM may require a different CUDA version,
such as 12.8 for Blackwell support.
This complicates the process as we cannot use the out-of-the-box
`pip install torch torchvision torchaudio` command. The solution is to use
`--extra-index-url` in vLLM's Dockerfiles.
1. Use `--extra-index-url https://download.pytorch.org/whl/cu128` to install torch+cu128.
2. Other important indexes at the moment include:
1. CPU https://download.pytorch.org/whl/cpu
2. ROCm https://download.pytorch.org/whl/rocm6.2.4 and https://download.pytorch.org/whl/rocm6.3
3. XPU https://download.pytorch.org/whl/xpu
3. Update .buildkite/release-pipeline.yaml and .buildkite/scripts/upload-wheels.sh to
match the CUDA version from step 1. This makes sure that the release vLLM wheel is tested
on CI.
- Important indexes at the moment include:
| Platform | `--extra-index-url` |
|----------|-----------------|
| CUDA 12.8| [https://download.pytorch.org/whl/cu128](https://download.pytorch.org/whl/cu128)|
| CPU | [https://download.pytorch.org/whl/cpu](https://download.pytorch.org/whl/cpu)|
| ROCm 6.2 | [https://download.pytorch.org/whl/rocm6.2.4](https://download.pytorch.org/whl/rocm6.2.4) |
| ROCm 6.3 | [https://download.pytorch.org/whl/rocm6.3](https://download.pytorch.org/whl/rocm6.3) |
| XPU | [https://download.pytorch.org/whl/xpu](https://download.pytorch.org/whl/xpu) |
- Update the below files to match the CUDA version from step 1. This makes sure that the release vLLM wheel is tested on CI.
- `.buildkite/release-pipeline.yaml`
- `.buildkite/scripts/upload-wheels.sh`
## Address long vLLM build time
@ -66,7 +87,7 @@ it doesn't populate the cache, so re-running it to warm up the cache
is ineffective.
While ongoing efforts like [#17419](gh-issue:17419)
address the long build time at its source, the current workaround is to set VLLM_CI_BRANCH
address the long build time at its source, the current workaround is to set `VLLM_CI_BRANCH`
to a custom branch provided by @khluu (`VLLM_CI_BRANCH=khluu/use_postmerge_q`)
when manually triggering a build on Buildkite. This branch accomplishes two things:
@ -86,17 +107,18 @@ releases (which would take too much time), they can be built from
source to unblock the update process.
### FlashInfer
Here is how to build and install it from source with torch2.7.0+cu128 in vLLM [Dockerfile](https://github.com/vllm-project/vllm/blob/27bebcd89792d5c4b08af7a65095759526f2f9e1/docker/Dockerfile#L259-L271):
Here is how to build and install it from source with `torch2.7.0+cu128` in vLLM [Dockerfile](https://github.com/vllm-project/vllm/blob/27bebcd89792d5c4b08af7a65095759526f2f9e1/docker/Dockerfile#L259-L271):
```bash
export TORCH_CUDA_ARCH_LIST='7.5 8.0 8.9 9.0 10.0+PTX'
export FLASHINFER_ENABLE_SM90=1
uv pip install --system --no-build-isolation "git+https://github.com/flashinfer-ai/flashinfer@v0.2.6.post1"
uv pip install --system \
--no-build-isolation "git+https://github.com/flashinfer-ai/flashinfer@v0.2.6.post1"
```
One caveat is that building FlashInfer from source adds approximately 30
minutes to the vLLM build time. Therefore, it's preferable to cache the wheel in a
public location for immediate installation, such as https://download.pytorch.org/whl/cu128/flashinfer/flashinfer_python-0.2.6.post1%2Bcu128torch2.7-cp39-abi3-linux_x86_64.whl. For future releases, contact the PyTorch release
public location for immediate installation, such as [this FlashInfer wheel link](https://download.pytorch.org/whl/cu128/flashinfer/flashinfer_python-0.2.6.post1%2Bcu128torch2.7-cp39-abi3-linux_x86_64.whl). For future releases, contact the PyTorch release
team if you want to get the package published there.
### xFormers
@ -104,13 +126,15 @@ Similar to FlashInfer, here is how to build and install xFormers from source:
```bash
export TORCH_CUDA_ARCH_LIST='7.0 7.5 8.0 8.9 9.0 10.0+PTX'
MAX_JOBS=16 uv pip install --system --no-build-isolation "git+https://github.com/facebookresearch/xformers@v0.0.30"
MAX_JOBS=16 uv pip install --system \
--no-build-isolation "git+https://github.com/facebookresearch/xformers@v0.0.30"
```
### Mamba
```bash
uv pip install --system --no-build-isolation "git+https://github.com/state-spaces/mamba@v2.2.4"
uv pip install --system \
--no-build-isolation "git+https://github.com/state-spaces/mamba@v2.2.4"
```
### causal-conv1d
@ -125,6 +149,6 @@ Rather than attempting to update all vLLM platforms in a single pull request, it
to handle some platforms separately. The separation of requirements and Dockerfiles
for different platforms in vLLM CI/CD allows us to selectively choose
which platforms to update. For instance, updating XPU requires the corresponding
release from https://github.com/intel/intel-extension-for-pytorch by Intel.
release from [Intel Extension for PyTorch](https://github.com/intel/intel-extension-for-pytorch) by Intel.
While <gh-pr:16859> updated vLLM to PyTorch 2.7.0 on CPU, CUDA, and ROCm,
<gh-pr:17444> completed the update for XPU.