7 Commits

Author SHA1 Message Date
yyzxw
19f76ee68e
[misc] refactor speculative config (#25657)
Signed-off-by: zxw <1020938856@qq.com>
2025-09-26 01:22:06 -07:00
XuruiYang
845adb3ec6
[Model] Add LongCat-Flash (#23991)
Signed-off-by: yangxurui <yangxurui@meituan.com>
Co-authored-by: yangxurui <yangxurui@meituan.com>
2025-09-24 21:53:40 -07:00
Woosuk Kwon
2e19a848d4
[V0 Deprecation] Remove max_seq_len_to_capture (#25543)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-24 01:51:39 -07:00
Eldar Kurtić
21467f9a1c
Enable Eagle3 speculative decoding for GPT-OSS model (#25246)
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>
2025-09-22 08:50:39 +00:00
qizixi
c4cb0af98a
[spec decode] Fix MTP inference path for MiMo-7B model (#25136)
Signed-off-by: zixi-qi <qizixi@meta.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-09-18 09:12:19 -07:00
Benjamin Chislett
b7433ca1a4
[Spec Decode] Efficient padded speculation (#24539)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
2025-09-18 01:07:24 -04:00
Harry Mellor
0faf3cc3e8
Move SpeculativeConfig from config/__init__.py to config/speculative.py (#24904)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-16 12:51:35 +01:00