mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2026-01-05 01:10:52 +08:00
[Doc] Fix description in the Automatic Prefix Caching design doc (#19333)
Signed-off-by: cr7258 <chengzw258@163.com>
This commit is contained in:
parent
12e5829221
commit
0eca5eacd0
@ -144,7 +144,7 @@ As a result, we will have the following components when the KV cache manager is
|
||||
|
||||
**Running request:** Workflow for the scheduler to schedule a running request with KV cache block allocation:
|
||||
|
||||
1. The scheduler calls `kv_cache_manager.append_slots()`. It does the following steps:
|
||||
1. The scheduler calls `kv_cache_manager.allocate_slots()`. It does the following steps:
|
||||
1. Compute the number of new required blocks, and return if there are no sufficient blocks to allocate.
|
||||
2. Allocate new blocks by popping the heads of the free queue. If the head block is a cached block, this also “evicts” the block so that no other requests can reuse it anymore from now on.
|
||||
3. Append token IDs to the slots in existing blocks as well as the new blocks. If a block is full, we add it to the Cache Block to cache it.
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user