mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-10 00:45:01 +08:00
[OpenVINO] Updated documentation (#7687)
This commit is contained in:
parent
5288c06aa0
commit
398521ad19
@ -70,7 +70,7 @@ vLLM OpenVINO backend uses the following environment variables to control behavi
|
|||||||
|
|
||||||
- ``VLLM_OPENVINO_CPU_KV_CACHE_PRECISION=u8`` to control KV cache precision. By default, FP16 / BF16 is used depending on platform.
|
- ``VLLM_OPENVINO_CPU_KV_CACHE_PRECISION=u8`` to control KV cache precision. By default, FP16 / BF16 is used depending on platform.
|
||||||
|
|
||||||
- ``VLLM_OPENVINO_ENABLE_QUANTIZED_WEIGHTS=ON`` to enable U8 weights compression during model loading stage. By default, compression is turned off.
|
- ``VLLM_OPENVINO_ENABLE_QUANTIZED_WEIGHTS=ON`` to enable U8 weights compression during model loading stage. By default, compression is turned off. You can also export model with different compression techniques using `optimum-cli` and pass exported folder as `<model_id>`
|
||||||
|
|
||||||
To enable better TPOT / TTFT latency, you can use vLLM's chunked prefill feature (``--enable-chunked-prefill``). Based on the experiments, the recommended batch size is ``256`` (``--max-num-batched-tokens``)
|
To enable better TPOT / TTFT latency, you can use vLLM's chunked prefill feature (``--enable-chunked-prefill``). Based on the experiments, the recommended batch size is ``256`` (``--max-num-batched-tokens``)
|
||||||
|
|
||||||
@ -91,5 +91,3 @@ Limitations
|
|||||||
- Only LLM models are currently supported. LLaVa and encoder-decoder models are not currently enabled in vLLM OpenVINO integration.
|
- Only LLM models are currently supported. LLaVa and encoder-decoder models are not currently enabled in vLLM OpenVINO integration.
|
||||||
|
|
||||||
- Tensor and pipeline parallelism are not currently enabled in vLLM integration.
|
- Tensor and pipeline parallelism are not currently enabled in vLLM integration.
|
||||||
|
|
||||||
- Speculative sampling is not tested within vLLM integration.
|
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user