mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2026-01-02 12:34:02 +08:00
[Doc] Add allocate_slots parameter docs (#29777)
Signed-off-by: maang <maang_h@163.com> Signed-off-by: maang-h <55082429+maang-h@users.noreply.github.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>
This commit is contained in:
parent
c014de1ec7
commit
5d91d2b292
@ -230,6 +230,9 @@ class KVCacheManager:
|
||||
delay_cache_blocks: Whether to skip caching the blocks. This is
|
||||
used by P/D when allocating blocks used in a KV transfer
|
||||
which will complete in a future step.
|
||||
num_encoder_tokens: The number of encoder tokens to allocate for
|
||||
cross-attention in encoder-decoder models(e.g., Whisper).
|
||||
For decoder-only models, this should be 0.
|
||||
|
||||
Blocks layout:
|
||||
```
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user