mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-10 21:25:01 +08:00
Signed-off-by: jiang1.li <jiang1.li@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
1.9 KiB
1.9 KiB
--8<-- [start:installation]
vLLM supports basic model inferencing and serving on x86 CPU platform, with data types FP32, FP16 and BF16.
--8<-- [end:installation]
--8<-- [start:requirements]
- OS: Linux
- CPU flags:
avx512f,avx512_bf16(Optional),avx512_vnni(Optional)
!!! tip
Use lscpu to check the CPU flags.
--8<-- [end:requirements]
--8<-- [start:set-up-using-python]
--8<-- [end:set-up-using-python]
--8<-- [start:pre-built-wheels]
--8<-- [end:pre-built-wheels]
--8<-- [start:build-wheel-from-source]
--8<-- "docs/getting_started/installation/cpu/build.inc.md"
--8<-- [end:build-wheel-from-source]
--8<-- [start:pre-built-images]
https://gallery.ecr.aws/q9t5s3a7/vllm-cpu-release-repo
!!! warning
If deploying the pre-built images on machines only contain avx512f, Illegal instruction error may be raised. It is recommended to build images for these machines with --build-arg VLLM_CPU_AVX512BF16=false and --build-arg VLLM_CPU_AVX512VNNI=false.
--8<-- [end:pre-built-images]
--8<-- [start:build-image-from-source]
docker build -f docker/Dockerfile.cpu \
--build-arg VLLM_CPU_AVX512BF16=false (default)|true \
--build-arg VLLM_CPU_AVX512VNNI=false (default)|true \
--tag vllm-cpu-env \
--target vllm-openai .
# Launching OpenAI server
docker run --rm \
--privileged=true \
--shm-size=4g \
-p 8000:8000 \
-e VLLM_CPU_KVCACHE_SPACE=<KV cache space> \
-e VLLM_CPU_OMP_THREADS_BIND=<CPU cores for inference> \
vllm-cpu-env \
--model=meta-llama/Llama-3.2-1B-Instruct \
--dtype=bfloat16 \
other vLLM OpenAI server arguments