diff --git a/docs/deployment/docker.md b/docs/deployment/docker.md index 516640f6fd3c..9e506d3d7ba3 100644 --- a/docs/deployment/docker.md +++ b/docs/deployment/docker.md @@ -107,10 +107,21 @@ DOCKER_BUILDKIT=1 docker build . \ -t vllm/vllm-gh200-openai:latest \ --build-arg max_jobs=66 \ --build-arg nvcc_threads=2 \ - --build-arg torch_cuda_arch_list="9.0+PTX" \ + --build-arg torch_cuda_arch_list="9.0 10.0+PTX" \ --build-arg vllm_fa_cmake_gpu_arches="90-real" ``` +!!! note + If you are building the `linux/arm64` image on a non-ARM host (e.g., an x86_64 machine), you need to ensure your system is set up for cross-compilation using QEMU. This allows your host machine to emulate ARM64 execution. + + Run the following command on your host machine to register QEMU user static handlers: + + ```console + docker run --rm --privileged multiarch/qemu-user-static --reset -p yes + ``` + + After setting up QEMU, you can use the `--platform "linux/arm64"` flag in your `docker build` command. + ## Use the custom-built vLLM Docker image To run vLLM with the custom-built Docker image: