mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-16 10:45:45 +08:00
Update docker docs with ARM CUDA cross-compile (#19037)
Signed-off-by: mgoin <michael@neuralmagic.com>
This commit is contained in:
parent
f32fcd9444
commit
6d18ed2a2e
@ -107,10 +107,21 @@ DOCKER_BUILDKIT=1 docker build . \
|
|||||||
-t vllm/vllm-gh200-openai:latest \
|
-t vllm/vllm-gh200-openai:latest \
|
||||||
--build-arg max_jobs=66 \
|
--build-arg max_jobs=66 \
|
||||||
--build-arg nvcc_threads=2 \
|
--build-arg nvcc_threads=2 \
|
||||||
--build-arg torch_cuda_arch_list="9.0+PTX" \
|
--build-arg torch_cuda_arch_list="9.0 10.0+PTX" \
|
||||||
--build-arg vllm_fa_cmake_gpu_arches="90-real"
|
--build-arg vllm_fa_cmake_gpu_arches="90-real"
|
||||||
```
|
```
|
||||||
|
|
||||||
|
!!! note
|
||||||
|
If you are building the `linux/arm64` image on a non-ARM host (e.g., an x86_64 machine), you need to ensure your system is set up for cross-compilation using QEMU. This allows your host machine to emulate ARM64 execution.
|
||||||
|
|
||||||
|
Run the following command on your host machine to register QEMU user static handlers:
|
||||||
|
|
||||||
|
```console
|
||||||
|
docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
|
||||||
|
```
|
||||||
|
|
||||||
|
After setting up QEMU, you can use the `--platform "linux/arm64"` flag in your `docker build` command.
|
||||||
|
|
||||||
## Use the custom-built vLLM Docker image
|
## Use the custom-built vLLM Docker image
|
||||||
|
|
||||||
To run vLLM with the custom-built Docker image:
|
To run vLLM with the custom-built Docker image:
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user