mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-10 23:55:19 +08:00
97 lines
2.9 KiB
Markdown
97 lines
2.9 KiB
Markdown
# --8<-- [start:installation]
|
|
|
|
vLLM has been adapted to work on ARM64 CPUs with NEON support, leveraging the CPU backend initially developed for the x86 platform.
|
|
|
|
ARM CPU backend currently supports Float32, FP16 and BFloat16 datatypes.
|
|
|
|
!!! warning
|
|
There are no pre-built wheels or images for this device, so you must build vLLM from source.
|
|
|
|
# --8<-- [end:installation]
|
|
# --8<-- [start:requirements]
|
|
|
|
- OS: Linux
|
|
- Compiler: `gcc/g++ >= 12.3.0` (optional, recommended)
|
|
- Instruction Set Architecture (ISA): NEON support is required
|
|
|
|
# --8<-- [end:requirements]
|
|
# --8<-- [start:set-up-using-python]
|
|
|
|
# --8<-- [end:set-up-using-python]
|
|
# --8<-- [start:pre-built-wheels]
|
|
|
|
# --8<-- [end:pre-built-wheels]
|
|
# --8<-- [start:build-wheel-from-source]
|
|
|
|
First, install the recommended compiler. We recommend using `gcc/g++ >= 12.3.0` as the default compiler to avoid potential problems. For example, on Ubuntu 22.4, you can run:
|
|
|
|
```bash
|
|
sudo apt-get update -y
|
|
sudo apt-get install -y --no-install-recommends ccache git curl wget ca-certificates gcc-12 g++-12 libtcmalloc-minimal4 libnuma-dev ffmpeg libsm6 libxext6 libgl1 jq lsof
|
|
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 10 --slave /usr/bin/g++ g++ /usr/bin/g++-12
|
|
```
|
|
|
|
Second, clone the vLLM project:
|
|
|
|
```bash
|
|
git clone https://github.com/vllm-project/vllm.git vllm_source
|
|
cd vllm_source
|
|
```
|
|
|
|
Third, install required dependencies:
|
|
|
|
```bash
|
|
uv pip install -r requirements/cpu-build.txt --torch-backend cpu
|
|
uv pip install -r requirements/cpu.txt --torch-backend cpu
|
|
```
|
|
|
|
??? console "pip"
|
|
```bash
|
|
pip install --upgrade pip
|
|
pip install -v -r requirements/cpu-build.txt --extra-index-url https://download.pytorch.org/whl/cpu
|
|
pip install -v -r requirements/cpu.txt --extra-index-url https://download.pytorch.org/whl/cpu
|
|
```
|
|
|
|
Finally, build and install vLLM:
|
|
|
|
```bash
|
|
VLLM_TARGET_DEVICE=cpu uv pip install . --no-build-isolation
|
|
```
|
|
|
|
If you want to develop vLLM, install it in editable mode instead.
|
|
|
|
```bash
|
|
VLLM_TARGET_DEVICE=cpu uv pip install -e . --no-build-isolation
|
|
```
|
|
|
|
Testing has been conducted on AWS Graviton3 instances for compatibility.
|
|
|
|
# --8<-- [end:build-wheel-from-source]
|
|
# --8<-- [start:pre-built-images]
|
|
|
|
# --8<-- [end:pre-built-images]
|
|
# --8<-- [start:build-image-from-source]
|
|
```bash
|
|
docker build -f docker/Dockerfile.cpu \
|
|
--tag vllm-cpu-env .
|
|
|
|
# Launching OpenAI server
|
|
docker run --rm \
|
|
--privileged=true \
|
|
--shm-size=4g \
|
|
-p 8000:8000 \
|
|
-e VLLM_CPU_KVCACHE_SPACE=<KV cache space> \
|
|
-e VLLM_CPU_OMP_THREADS_BIND=<CPU cores for inference> \
|
|
vllm-cpu-env \
|
|
--model=meta-llama/Llama-3.2-1B-Instruct \
|
|
--dtype=bfloat16 \
|
|
other vLLM OpenAI server arguments
|
|
```
|
|
|
|
!!! tip
|
|
An alternative of `--privileged=true` is `--cap-add SYS_NICE --security-opt seccomp=unconfined`.
|
|
|
|
# --8<-- [end:build-image-from-source]
|
|
# --8<-- [start:extra-information]
|
|
# --8<-- [end:extra-information]
|