[Docs] use uv in CPU installation docs (#22089)

Signed-off-by: David Xia <david@davidxia.com>
This commit is contained in:
David Xia 2025-08-01 10:55:55 -04:00 committed by GitHub
parent 3146519add
commit 97608dc276
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
3 changed files with 48 additions and 31 deletions

View File

@ -1,6 +1,6 @@
# --8<-- [start:installation]
vLLM has experimental support for macOS with Apple silicon. For now, users shall build from the source vLLM to natively run on macOS.
vLLM has experimental support for macOS with Apple silicon. For now, users must build from source to natively run on macOS.
Currently the CPU implementation for macOS supports FP32 and FP16 datatypes.
@ -23,20 +23,20 @@ Currently the CPU implementation for macOS supports FP32 and FP16 datatypes.
# --8<-- [end:pre-built-wheels]
# --8<-- [start:build-wheel-from-source]
After installation of XCode and the Command Line Tools, which include Apple Clang, execute the following commands to build and install vLLM from the source.
After installation of XCode and the Command Line Tools, which include Apple Clang, execute the following commands to build and install vLLM from source.
```bash
git clone https://github.com/vllm-project/vllm.git
cd vllm
pip install -r requirements/cpu.txt
pip install -e .
uv pip install -r requirements/cpu.txt
uv pip install -e .
```
!!! note
On macOS the `VLLM_TARGET_DEVICE` is automatically set to `cpu`, which currently is the only supported device.
On macOS the `VLLM_TARGET_DEVICE` is automatically set to `cpu`, which is currently the only supported device.
!!! example "Troubleshooting"
If the build has error like the following snippet where standard C++ headers cannot be found, try to remove and reinstall your
If the build fails with errors like the following where standard C++ headers cannot be found, try to remove and reinstall your
[Command Line Tools for Xcode](https://developer.apple.com/download/all/).
```text

View File

@ -1,4 +1,4 @@
First, install recommended compiler. We recommend to use `gcc/g++ >= 12.3.0` as the default compiler to avoid potential problems. For example, on Ubuntu 22.4, you can run:
First, install the recommended compiler. We recommend using `gcc/g++ >= 12.3.0` as the default compiler to avoid potential problems. For example, on Ubuntu 22.4, you can run:
```bash
sudo apt-get update -y
@ -6,28 +6,34 @@ sudo apt-get install -y --no-install-recommends ccache git curl wget ca-certific
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 10 --slave /usr/bin/g++ g++ /usr/bin/g++-12
```
Second, clone vLLM project:
Second, clone the vLLM project:
```bash
git clone https://github.com/vllm-project/vllm.git vllm_source
cd vllm_source
```
Third, install Python packages for vLLM CPU backend building:
Third, install required dependencies:
```bash
pip install --upgrade pip
pip install -v -r requirements/cpu-build.txt --extra-index-url https://download.pytorch.org/whl/cpu
pip install -v -r requirements/cpu.txt --extra-index-url https://download.pytorch.org/whl/cpu
uv pip install -r requirements/cpu-build.txt --torch-backend auto
uv pip install -r requirements/cpu.txt --torch-backend auto
```
Finally, build and install vLLM CPU backend:
??? console "pip"
```bash
pip install --upgrade pip
pip install -v -r requirements/cpu-build.txt --extra-index-url https://download.pytorch.org/whl/cpu
pip install -v -r requirements/cpu.txt --extra-index-url https://download.pytorch.org/whl/cpu
```
Finally, build and install vLLM:
```bash
VLLM_TARGET_DEVICE=cpu python setup.py install
```
If you want to develop vllm, install it in editable mode instead.
If you want to develop vLLM, install it in editable mode instead.
```bash
VLLM_TARGET_DEVICE=cpu python setup.py develop

View File

@ -1,6 +1,6 @@
# --8<-- [start:installation]
vLLM has experimental support for s390x architecture on IBM Z platform. For now, users shall build from the vLLM source to natively run on IBM Z platform.
vLLM has experimental support for s390x architecture on IBM Z platform. For now, users must build from source to natively run on IBM Z platform.
Currently the CPU implementation for s390x architecture supports FP32 datatype only.
@ -40,21 +40,32 @@ curl https://sh.rustup.rs -sSf | sh -s -- -y && \
. "$HOME/.cargo/env"
```
Execute the following commands to build and install vLLM from the source.
Execute the following commands to build and install vLLM from source.
!!! tip
Please build the following dependencies, `torchvision`, `pyarrow` from the source before building vLLM.
Please build the following dependencies, `torchvision`, `pyarrow` from source before building vLLM.
```bash
sed -i '/^torch/d' requirements-build.txt # remove torch from requirements-build.txt since we use nightly builds
pip install -v \
--extra-index-url https://download.pytorch.org/whl/nightly/cpu \
uv pip install -v \
--torch-backend auto \
-r requirements-build.txt \
-r requirements-cpu.txt \
VLLM_TARGET_DEVICE=cpu python setup.py bdist_wheel && \
pip install dist/*.whl
uv pip install dist/*.whl
```
??? console "pip"
```bash
sed -i '/^torch/d' requirements-build.txt # remove torch from requirements-build.txt since we use nightly builds
pip install -v \
--extra-index-url https://download.pytorch.org/whl/nightly/cpu \
-r requirements-build.txt \
-r requirements-cpu.txt \
VLLM_TARGET_DEVICE=cpu python setup.py bdist_wheel && \
pip install dist/*.whl
```
# --8<-- [end:build-wheel-from-source]
# --8<-- [start:pre-built-images]
@ -63,19 +74,19 @@ Execute the following commands to build and install vLLM from the source.
```bash
docker build -f docker/Dockerfile.s390x \
--tag vllm-cpu-env .
--tag vllm-cpu-env .
# Launching OpenAI server
# Launch OpenAI server
docker run --rm \
--privileged=true \
--shm-size=4g \
-p 8000:8000 \
-e VLLM_CPU_KVCACHE_SPACE=<KV cache space> \
-e VLLM_CPU_OMP_THREADS_BIND=<CPU cores for inference> \
vllm-cpu-env \
--model=meta-llama/Llama-3.2-1B-Instruct \
--dtype=float \
other vLLM OpenAI server arguments
--privileged true \
--shm-size 4g \
-p 8000:8000 \
-e VLLM_CPU_KVCACHE_SPACE=<KV cache space> \
-e VLLM_CPU_OMP_THREADS_BIND=<CPU cores for inference> \
vllm-cpu-env \
--model meta-llama/Llama-3.2-1B-Instruct \
--dtype float \
other vLLM OpenAI server arguments
```
# --8<-- [end:build-image-from-source]