mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-10 03:35:17 +08:00

[docs] add SYS_NICE cap & security-opt for docker/k8s (#24017 )

Signed-off-by: Peter Pan <Peter.Pan@daocloud.io>
Signed-off-by: Peter Pan <peter.pan@daocloud.io>
Co-authored-by: Li, Jiang <bigpyj64@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

2025-09-02 17:27:20 +00:00

2.3 KiB

Raw Blame History

--8<-- [start:installation]

vLLM supports basic model inferencing and serving on x86 CPU platform, with data types FP32, FP16 and BF16.

--8<-- [end:installation]

--8<-- [start:requirements]

OS: Linux
CPU flags: avx512f (Recommended), avx512_bf16 (Optional), avx512_vnni (Optional)

!!! tip Use lscpu to check the CPU flags.

--8<-- [end:requirements]

--8<-- [start:set-up-using-python]

--8<-- [end:set-up-using-python]

--8<-- [start:pre-built-wheels]

--8<-- [end:pre-built-wheels]

--8<-- [start:build-wheel-from-source]

--8<-- "docs/getting_started/installation/cpu/build.inc.md"

--8<-- [end:build-wheel-from-source]

--8<-- [start:pre-built-images]

https://gallery.ecr.aws/q9t5s3a7/vllm-cpu-release-repo

!!! warning If deploying the pre-built images on machines without avx512f, avx512_bf16, or avx512_vnni support, an Illegal instruction error may be raised. It is recommended to build images for these machines with the appropriate build arguments (e.g., --build-arg VLLM_CPU_DISABLE_AVX512=true, --build-arg VLLM_CPU_AVX512BF16=false, or --build-arg VLLM_CPU_AVX512VNNI=false) to disable unsupported features. Please note that without avx512f, AVX2 will be used and this version is not recommended because it only has basic feature support.

--8<-- [end:pre-built-images]

--8<-- [start:build-image-from-source]

docker build -f docker/Dockerfile.cpu \
        --build-arg VLLM_CPU_AVX512BF16=false (default)|true \
        --build-arg VLLM_CPU_AVX512VNNI=false (default)|true \
        --build-arg VLLM_CPU_DISABLE_AVX512=false (default)|true \ 
        --tag vllm-cpu-env \
        --target vllm-openai .

# Launching OpenAI server
docker run --rm \
            --security-opt seccomp=unconfined \
            --cap-add SYS_NICE \
            --shm-size=4g \
            -p 8000:8000 \
            -e VLLM_CPU_KVCACHE_SPACE=<KV cache space> \
            -e VLLM_CPU_OMP_THREADS_BIND=<CPU cores for inference> \
            vllm-cpu-env \
            --model=meta-llama/Llama-3.2-1B-Instruct \
            --dtype=bfloat16 \
            other vLLM OpenAI server arguments

--8<-- [end:build-image-from-source]

--8<-- [start:extra-information]

--8<-- [end:extra-information]

2.3 KiB Raw Blame History