tjandy98
4504e8029b
[Bugfix] Prevent crash on empty grammar string ( #28210 )
...
Signed-off-by: tjandy98 <3953059+tjandy98@users.noreply.github.com>
2025-11-13 06:42:29 +00:00
Andrew Xia
1a0b157a2e
[Frontend][responsesAPI][1/n] convert responses API tool input to chat completions tool format ( #28231 )
...
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-11-13 04:47:22 +00:00
Jialin Ouyang
a1d3866dda
[n-gen] DO NOT repeatedly return finished child requests ( #28591 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-11-13 03:36:07 +00:00
Andy Lo
58ce8d12b7
[BugFix] Priority scheduling and spec tokens preemption ( #28558 )
...
Signed-off-by: Andy Lo <andy@mistral.ai>
2025-11-12 20:29:21 +00:00
alberto
bac904565f
Implement ARC KV cache eviction policy for CPU offloader ( #27039 )
...
Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
Signed-off-by: alberto <aperdomo@redhat.com>
Co-authored-by: Or Ozeri <or@ozery.com>
2025-11-12 09:51:39 -08:00
Chenguang Zheng
4ccffe561f
[Core] Encoder separation for Encode-Prefill-Decode Disaggregation ( #25233 )
...
Signed-off-by: n00909098 <nguyen.kha.long@huawei.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: herotai214 <herotai214@gmail.com>
Signed-off-by: Khuong Le <khuong.le.manh@huawei.com>
Signed-off-by: Khuong Le <lemanhkhuong2611@gmail.com>
Co-authored-by: n00909098 <nguyen.kha.long@huawei.com>
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Co-authored-by: herotai214 <herotai214@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Khuong Le <khuong.le.manh@huawei.com>
Co-authored-by: Khuong Le <lemanhkhuong2611@gmail.com>
2025-11-11 18:58:33 -08:00
Jialin Ouyang
4228be7959
[Perf] Use np.ndarray instead of list[list[int]] to reduce GC overhead ( #28245 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-11-11 10:28:47 -08:00
Nicolò Lucchesi
a7ef3eb0cd
[NIXL] Generalize block-first backend layouts (FlashInfer-like) ( #28282 )
2025-11-11 16:57:43 +00:00
Matthew Bonanni
b30dfa03c5
[Attention] Refactor CUDA attention backend selection logic ( #24794 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-11 07:40:44 -05:00
Rémi Delacourt
6d54336ae5
[Bugfix] Fix llguidance backend, rollback when EOS was encountered ( #25905 )
...
Signed-off-by: Rémi Delacourt <remi@mistral.ai>
Signed-off-by: remi <remi@mistral.ai>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-11-10 14:53:32 -05:00
Mark McLoughlin
6f7de33bed
[Metrics] Refactor LoRA state tracking ( #26801 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-11-10 16:34:36 +08:00
usberkeley
4a8d6bd168
Fix cu_num_generated_tokens slicing logic in LogprobsLists.slice() method ( #28214 )
...
Signed-off-by: Bradley <bradley.b.pitt@gmail.com>
2025-11-09 19:11:46 +00:00
Nick Hill
289eb6c537
[Core] Simplify async KV output aggregation ( #28327 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-09 09:44:13 -08:00
Nicolò Lucchesi
19d91ece4b
[CI] Fix flaky test_eagle_correctness test ( #28364 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-11-09 16:04:59 +00:00
zhangsicheng5
2108a571d7
[DCP] Support dcp kv_cache interleave size > 1 ( #26696 )
...
Signed-off-by: zhangsicheng5 <zhangsicheng5@huawei.com>
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Signed-off-by: Qiu <qiuchunshuo@huawei.com>
Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com>
2025-11-09 04:45:27 +09:00
Andy Lo
47604137a2
[Bugfix] Spec decode + structured output + spec model max len edge case ( #28298 )
...
Signed-off-by: Andy Lo <andy@mistral.ai>
2025-11-08 19:44:25 +00:00
Harry Mellor
d9ab1ad9d1
reasoning_content -> reasoning (#27752 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-08 12:15:08 +00:00
Xiaohong (Sean) Chen
d0c7792004
[Bugfix][LoRA][Spec Decode] Support LoRA with speculative decoding ( #21068 )
...
Signed-off-by: Sean Chen <xiaohong_chen1991@hotmail.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Danielle Robinson <dcmaddix@gmail.com>
Co-authored-by: Haipeng Li <li2haipeng@gmail.com>
Co-authored-by: li2haipeng <44383182+li2haipeng@users.noreply.github.com>
2025-11-08 01:58:22 +00:00
Nick Hill
67a2da890e
[PerfFix] Avoid separate thread for MP executor shm spin (take 2) ( #28319 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-07 22:11:03 +00:00
Nick Hill
da786e339e
[Core] Rework handling of async scheduling config ( #28250 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-07 20:01:23 +00:00
Nicolò Lucchesi
68a72a5cc1
Revert "[PerfFix] Avoid separate thread for MP executor shm spin ( #28012 )" ( #28289 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-11-07 15:07:01 +00:00
Matthew Bonanni
ca90f50304
[Test] Add non-MoE DP test coverage ( #28235 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-06 20:59:57 +00:00
Chauncey
59a50afa08
[Frontend] OpenAI Responses API supports Tool/Function calling - non-harmony ( #26874 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-11-06 10:40:03 +00:00
wangxiyuan
c3ee80a01a
[V0 deprecation]clean up is_v1_supported_oracle ( #28116 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-11-06 16:05:32 +08:00
Zhewen Li
0b8e871e5e
[CI/Build] Fix test_defaults_with_usage_context in AMD CI ( #27926 )
...
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-11-05 15:40:24 -08:00
Snehlata
e15601789b
[Feature]: Add corrupted request metric to V1 metrics system. ( #27306 )
...
Signed-off-by: atalhens <sneh.lata@nutanix.com>
2025-11-05 13:45:29 -08:00
Paul Zhang
faedbb4d4f
[Feature] Extend batch invariant torch.compile to B200 ( #27856 )
...
Signed-off-by: PaulZhang12 <paulzhan@fb.com>
2025-11-05 10:04:49 -08:00
Samuel Shen
40db194446
[CI]: Add LMCacheConnector Unit Tests ( #27852 )
...
Signed-off-by: Samuel Shen <slshen@uchciago.edu>
Co-authored-by: Samuel Shen <slshen@uchciago.edu>
Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>
2025-11-05 09:45:57 -08:00
Isotr0py
3f5a4b6473
[Bugfix] Validate custom logits processor xargs for online serving ( #27560 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-11-05 16:53:33 +00:00
Kuntai Du
86dca07d9b
[Hybrid allocator + kv connector] revert connector test changes related to hybrid allocator ( #28011 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
2025-11-05 10:36:31 +00:00
wangxiyuan
428bc7bf1c
[V0 deprecation] Remove VLLM_USE_V1 usage in most modules ( #27955 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-11-04 20:51:16 -08:00
Nick Hill
938a81692e
[AsyncScheduling] Don't schedule past request max_tokens ( #27922 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-04 17:06:28 +00:00
Nick Hill
c9f66da8fd
[PerfFix] Avoid separate thread for MP executor shm spin ( #28012 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-04 08:33:55 -08:00
Mark McLoughlin
58279c60b5
[KV Connector] Make KVCacheConfig an explicit constructor argument ( #27887 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-11-03 23:00:49 -08:00
Matthew Bonanni
01baefe674
Add TP parameter to attention tests ( #27683 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-03 13:04:40 -08:00
Aurick Qiao
2c19d96777
[Spec Decode] Integrate Suffix Decoding from Arctic Inference ( #25784 )
...
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>
2025-11-03 09:23:31 -08:00
Lucas Wilkinson
4bc400f47e
[CI/Testing] Add basic single node dual batch overlap test ( #27235 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-11-03 17:00:46 +00:00
Rémi Delacourt
cec7c28833
[Bugfix] Padded Eagle Specdec with Chunked Prefill ( #26263 )
...
Signed-off-by: Rémi Delacourt <remi@mistral.ai>
Signed-off-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com>
Signed-off-by: remi <remi@mistral.ai>
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>
2025-11-03 02:22:46 -05:00
Biswa Panda
1bf43ae35d
[BugFix][LoRA] use adapter_id instead of id field of lora_request ( #27728 )
...
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
2025-11-03 10:08:08 +08:00
Yihua Cheng
e675118849
[Add] cmdline argument parsing for KV cache offloading modules ( #27621 )
...
Signed-off-by: ApostaC <yihua98@uchicago.edu>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-01 07:17:07 +00:00
Nick Hill
0cdbe7b744
[Core] Async scheduling + structured outputs compatibility ( #26866 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-01 00:35:04 +00:00
Chen Zhang
df334868ca
[Hybrid] A simpler algorithm to find kernel_block_size ( #26476 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-10-31 21:30:28 +00:00
Matthew Bonanni
f29aeb5a25
Add FLASHINFER_MLA to test_mla_backends and add B200 CI run ( #27663 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-10-31 11:12:19 -07:00
GuanLuo
d6517be3cd
[Bugfix] Missing NIXL metadata for handshake initialization if instance spans multi-node ( #26338 )
...
Signed-off-by: Guan Luo <gluo@nvidia.com>
Signed-off-by: GuanLuo <41310872+GuanLuo@users.noreply.github.com>
Signed-off-by: Guan Luo <41310872+GuanLuo@users.noreply.github.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
2025-10-31 10:16:00 -07:00
Zhewen Li
0fe0140408
[KV offload] Enable CPU KV offload on CUDA alike Platforms ( #27770 )
...
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-10-30 22:10:29 +08:00
Lucas Wilkinson
b5d70751d8
[BugFix] Reordering extend logic fix ( #27739 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-10-29 21:39:34 -07:00
Nick Hill
2ce5c5d3d6
[BugFix] Handle unscheduled requests properly when async scheduling ( #27756 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-10-29 21:04:25 -07:00
Nicolò Lucchesi
0f95a1c3f2
[CI] Fix flaky test_two_responses_with_same_prev_id test ( #27745 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-10-29 15:10:35 +00:00
Zhewen Li
9a0d2f0d92
[CI/Build] Skip cpu offloading test on AMD ( #27690 )
...
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-10-29 12:55:51 +00:00
Dipika Sikka
413ef7a3b4
[Speculators] Move tests + fix integration ( #27308 )
...
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
Signed-off-by: rahul-tuli <rtuli@redhat.com>
Co-authored-by: Rahul Tuli <rtuli@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-10-29 00:54:21 -07:00