mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-18 04:55:01 +08:00
[Misc] update the comments (#15780)
Signed-off-by: chengyang liu <lcy4869@gmail.com> Co-authored-by: chengyang liu <lcy4869@gmail.com>
This commit is contained in:
parent
9b459eca88
commit
18ed3132d2
@ -673,7 +673,7 @@ class GPUModelRunner(LoRAModelRunnerMixin):
|
|||||||
# use two kernels for cascade attention. Let's imagine:
|
# use two kernels for cascade attention. Let's imagine:
|
||||||
# Request 3's input query: [D]
|
# Request 3's input query: [D]
|
||||||
# Request 3's kv cache: [A, B, C, D]
|
# Request 3's kv cache: [A, B, C, D]
|
||||||
# Request 3's num_computed_tokens: 4 (i.e., [A, B, C, D])
|
# Request 3's num_computed_tokens: 3 (i.e., [A, B, C])
|
||||||
# If we use [A, B, C, D] as the common prefix for Request 1-3,
|
# If we use [A, B, C, D] as the common prefix for Request 1-3,
|
||||||
# then Request 3 will be processed only by the first kernel,
|
# then Request 3 will be processed only by the first kernel,
|
||||||
# and the second kernel will get an empty input. While this is not
|
# and the second kernel will get an empty input. While this is not
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user