Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
This commit is contained in:
Woosuk Kwon 2025-09-18 16:31:01 -07:00
parent 82da219ff9
commit efda08481b

View File

@ -254,8 +254,8 @@ def compute_logprobs(
) )
# NOTE(woosuk): Here, to save GPU memory, we do not materialize the full # NOTE(woosuk): Here, to save GPU memory, we do not materialize the full
# logprobs tensor. Instead, we only compute the logprobs of the topk + 1 # logprobs tensor. Instead, we only compute and return the logprobs of
# tokens. # the topk + 1 tokens.
BLOCK_SIZE = 1024 BLOCK_SIZE = 1024
_topk_logprobs_kernel[(batch_size, )]( _topk_logprobs_kernel[(batch_size, )](
logprobs, logprobs,