[Doc] Add allocate_slots parameter docs (#29777)

Signed-off-by: maang <maang_h@163.com> Signed-off-by: maang-h <55082429+maang-h@users.noreply.github.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>
2026-01-02 12:34:02 +08:00 · 2025-12-03 07:23:09 +08:00 · 2025-12-03 07:23:09 +08:00 · 5d91d2b292
commit 5d91d2b292
parent c014de1ec7
1 changed files with 3 additions and 0 deletions
--- a/vllm/v1/core/kv_cache_manager.py
+++ b/vllm/v1/core/kv_cache_manager.py
@ -230,6 +230,9 @@ class KVCacheManager:
            delay_cache_blocks: Whether to skip caching the blocks. This is
                used by P/D when allocating blocks used in a KV transfer
                which will complete in a future step.
+            num_encoder_tokens: The number of encoder tokens to allocate for
+                cross-attention in encoder-decoder models(e.g., Whisper).
+                For decoder-only models, this should be 0.

        Blocks layout:
        ```