8 Commits

Author SHA1 Message Date
Nick Hill
ac86bff8cb
Revert "[Core] Performance: Use list[np.ndarray] instead of list[list… (#28773) 2025-11-14 20:24:00 -08:00
Jialin Ouyang
186352b270
[Core] Performance: Use list[np.ndarray] instead of list[list[int]] for output tokens for GC optimization (#26368)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-11-14 16:04:04 -08:00
Nick Hill
938a81692e
[AsyncScheduling] Don't schedule past request max_tokens (#27922)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-04 17:06:28 +00:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
Reza Barazesh
bc546f76a1
[CI] Move applicable tests to CPU (#24080)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-30 14:45:20 +01:00
Woosuk Kwon
c9b38be8aa
[Spec Decode] Make propose_draft_token_ids non-blocking for lower TTFT (#23041)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-18 17:20:38 -07:00
Or Ozeri
c280066f9d
[v1] Move block_hashes from KVCacheManager to Request.block_hashes (#19728)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-08-15 16:52:52 -07:00
Woosuk Kwon
d4d309409f
Implement Async Scheduling (#19970)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-14 23:01:46 -07:00