From d1695758b2f65fd314d1aee71ba2469ceba67a5b Mon Sep 17 00:00:00 2001 From: Shanshan Shen <467638484@qq.com> Date: Tue, 18 Mar 2025 16:15:46 +0800 Subject: [PATCH] [Doc][V1] Fix V1 APC doc (#14920) --- docs/source/design/v1/prefix_caching.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/design/v1/prefix_caching.md b/docs/source/design/v1/prefix_caching.md index 2fae22cc264e..3d14a76840d4 100644 --- a/docs/source/design/v1/prefix_caching.md +++ b/docs/source/design/v1/prefix_caching.md @@ -191,7 +191,7 @@ When the head block (least recently used block) of the free queue is cached, we In this example, we assume the block size is 4 (each block can cache 4 tokens), and we have 10 blocks in the KV-cache manager in total. -**Time 1: The cache is empty and a new request comes in.** We allocate 4 blocks. 3 of them are already full and cached. The fourth block is partially full with 2 of 4 tokens. +**Time 1: The cache is empty and a new request comes in.** We allocate 4 blocks. 3 of them are already full and cached. The fourth block is partially full with 3 of 4 tokens. :::{image} /assets/design/v1/prefix_caching/example-time-1.png :alt: Example Time 1 @@ -203,7 +203,7 @@ In this example, we assume the block size is 4 (each block can cache 4 tokens), :alt: Example Time 3 ::: -**Time 4: Request 1 comes in with the 14 prompt tokens, where the first 11 tokens are the same as request 0.** We can see that only 2 blocks (11 tokens) hit the cache, because the 3rd block only matches 3 of 4 tokens. +**Time 4: Request 1 comes in with the 14 prompt tokens, where the first 10 tokens are the same as request 0.** We can see that only the first 2 blocks (8 tokens) hit the cache, because the 3rd block only matches 2 of 4 tokens. :::{image} /assets/design/v1/prefix_caching/example-time-4.png :alt: Example Time 4