[Doc] Fix numbering sequence in prefix caching (#27357)

Signed-off-by: William Song <jinwook@umich.edu>
This commit is contained in:
William Song 2025-10-23 02:35:47 +09:00 committed by GitHub
parent e05a6754a8
commit 1cb8c6c5fe
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -213,22 +213,22 @@ In this example, we assume the block size is 4 (each block can cache 4 tokens),
![Example Time 1](../assets/design/prefix_caching/example-time-1.png)
**Time 3: Request 0 makes the block 3 full and asks for a new block to keep decoding.** We cache block 3 and allocate block 4.
**Time 2: Request 0 makes the block 3 full and asks for a new block to keep decoding.** We cache block 3 and allocate block 4.
![Example Time 3](../assets/design/prefix_caching/example-time-3.png)
![Example Time 2](../assets/design/prefix_caching/example-time-3.png)
**Time 4: Request 1 comes in with the 14 prompt tokens, where the first 10 tokens are the same as request 0.** We can see that only the first 2 blocks (8 tokens) hit the cache, because the 3rd block only matches 2 of 4 tokens.
**Time 3: Request 1 comes in with the 14 prompt tokens, where the first 10 tokens are the same as request 0.** We can see that only the first 2 blocks (8 tokens) hit the cache, because the 3rd block only matches 2 of 4 tokens.
![Example Time 4](../assets/design/prefix_caching/example-time-4.png)
![Example Time 3](../assets/design/prefix_caching/example-time-4.png)
**Time 5: Request 0 is finished and free.** Blocks 2, 3 and 4 are added to the free queue in the reverse order (but block 2 and 3 are still cached). Block 0 and 1 are not added to the free queue because they are being used by Request 1.
**Time 4: Request 0 is finished and free.** Blocks 2, 3 and 4 are added to the free queue in the reverse order (but block 2 and 3 are still cached). Block 0 and 1 are not added to the free queue because they are being used by Request 1.
![Example Time 5](../assets/design/prefix_caching/example-time-5.png)
![Example Time 4](../assets/design/prefix_caching/example-time-5.png)
**Time 6: Request 1 is finished and free.**
**Time 5: Request 1 is finished and free.**
![Example Time 6](../assets/design/prefix_caching/example-time-6.png)
![Example Time 5](../assets/design/prefix_caching/example-time-6.png)
**Time 7: Request 2 comes in with the 29 prompt tokens, where the first 12 tokens are the same as request 0\.** Note that even the block order in the free queue was `7 - 8 - 9 - 4 - 3 - 2 - 6 - 5 - 1 - 0`, the cache hit blocks (i.e., 0, 1, 2) are touched and removed from the queue before allocation, so the free queue becomes `7 - 8 - 9 - 4 - 3 - 6 - 5`. As a result, the allocated blocks are 0 (cached), 1 (cached), 2 (cached), 7, 8, 9, 4, 3 (evicted).
**Time 6: Request 2 comes in with the 29 prompt tokens, where the first 12 tokens are the same as request 0\.** Note that even the block order in the free queue was `7 - 8 - 9 - 4 - 3 - 2 - 6 - 5 - 1 - 0`, the cache hit blocks (i.e., 0, 1, 2) are touched and removed from the queue before allocation, so the free queue becomes `7 - 8 - 9 - 4 - 3 - 6 - 5`. As a result, the allocated blocks are 0 (cached), 1 (cached), 2 (cached), 7, 8, 9, 4, 3 (evicted).
![Example Time 7](../assets/design/prefix_caching/example-time-7.png)
![Example Time 6](../assets/design/prefix_caching/example-time-7.png)