[Doc] Add allocate_slots parameter docs (#29777)

Signed-off-by: maang <maang_h@163.com>
Signed-off-by: maang-h <55082429+maang-h@users.noreply.github.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
This commit is contained in:
maang-h 2025-12-03 07:23:09 +08:00 committed by GitHub
parent c014de1ec7
commit 5d91d2b292
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -230,6 +230,9 @@ class KVCacheManager:
delay_cache_blocks: Whether to skip caching the blocks. This is
used by P/D when allocating blocks used in a KV transfer
which will complete in a future step.
num_encoder_tokens: The number of encoder tokens to allocate for
cross-attention in encoder-decoder models(e.g., Whisper).
For decoder-only models, this should be 0.
Blocks layout:
```