Woosuk Kwon
|
41fb013d29
|
[V1][Spec Decode] Always use argmax for sampling draft tokens (#16899)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-04-23 14:57:43 -07:00 |
|
Woosuk Kwon
|
2bc4be4e32
|
[V1][Minor] Simplify rejection sampler's parse_output (#15741)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-29 09:25:17 -07:00 |
|
Cody Yu
|
54aa619459
|
[V1] Refactor num_computed_tokens logic (#15307)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-27 04:54:36 +00:00 |
|
Woosuk Kwon
|
25f560a62c
|
[V1][Spec Decode] Update target_logits in place for rejection sampling (#15427)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-24 21:04:41 -07:00 |
|
Woosuk Kwon
|
911c8eb000
|
[Minor][Spec Decode] Remove compiled_softmax (#15416)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-24 19:09:04 -07:00 |
|
Woosuk Kwon
|
ebcebeeb6b
|
[V1][Spec Decode] Enable spec decode for top-p & top-k sampling (#15063)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-24 17:16:46 -07:00 |
|
Woosuk Kwon
|
99abb8b650
|
[V1][Spec Decode] Optimize Rejection Sampler with Triton Kernels (#14930)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-18 14:31:54 -07:00 |
|
Lily Liu
|
8d6cf89526
|
[V1] [Spec Decode] Support random sampling for spec decode (#13933)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-16 22:00:20 -07:00 |
|
Woosuk Kwon
|
32ef4983cd
|
[V1] Temporarily disable FlashInfer Rejection Sampler (#14788)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-13 20:40:35 -07:00 |
|
Harry Mellor
|
cf069aa8aa
|
Update deprecated Python 3.8 typing (#13971)
|
2025-03-02 17:34:51 -08:00 |
|
Lily Liu
|
5629f26df7
|
[V1][Spec Decode] Change Spec Decode Rejection Sampling API (#13729)
|
2025-02-25 18:14:48 -08:00 |
|
Nick Hill
|
30172b4947
|
[V1] Optimize handling of sampling metadata and req_ids list (#13244)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-02-18 12:15:33 -08:00 |
|
Woosuk Kwon
|
69e1d23e1e
|
[V1][BugFix] Clean up rejection sampler & Fix warning msg (#13362)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-16 12:25:29 -08:00 |
|
Lily Liu
|
80f63a3966
|
[V1][Spec Decode] Ngram Spec Decode (#12193)
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
|
2025-02-15 18:05:11 -08:00 |
|