Jee Jee Li
463074fac8
Merge branch 'main' into mlm-full-lora-support
2025-12-20 08:25:41 +08:00
Zhonghua Deng
969bbc7c61
[Model] Add MiMo-V2-Flash support ( #30836 )
...
Signed-off-by: Abatom <abzhonghua@gmail.com>
Signed-off-by: Jumiar <liuanqim10@126.com>
Signed-off-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jumiar <liuanqim10@126.com>
Co-authored-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-19 17:17:03 +00:00
Elizabeth Thomas
41b6f9200f
Remove all2all backend envvar ( #30363 )
...
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-18 19:46:28 +00:00
Lucas Wilkinson
30bb19a760
[BugFix] Partial revert of #29558 (DeepEP HT + PIECEWISE CG support) ( #30910 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-17 23:50:15 -08:00
Zhengxu Chen
5f2f3fba1d
[compile] Fix CI for test_gpt2_cache_hit ( #30902 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
2025-12-17 20:22:23 -08:00
SungMinCho
a0b782f9cc
[Metrics] Model FLOPs Utilization estimation ( #30738 )
...
Signed-off-by: SungMinCho <tjdals4565@gmail.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
2025-12-18 01:40:51 +00:00
Jee Jee Li
94dce5c3d9
Merge branch 'main' into mlm-full-lora-support
2025-12-17 00:33:42 +08:00
Boyuan Feng
104003dc77
update piecewise cudagraph warning when splitting_ops=[] ( #30728 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-12-16 06:09:34 -08:00
B-201
bdac2b5d17
Merge branch 'main' into mlm-full-lora-support
2025-12-16 19:13:22 +08:00
jiangkuaixue123
b9ff4f2a8d
[feature] extend DBO to XBO ( #30120 )
...
Signed-off-by: jiangkuaixue123 <jiangxiaozhou111@163.com>
Co-authored-by: root <root@hk01dgx028.cm.cluster>
2025-12-16 00:04:01 -05:00
Michael Goin
a450c64a30
[Bugfix] Fail instead of ignoring when CompilationConfig gets invalid args ( #30708 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-15 20:18:02 +00:00
Harry Mellor
970713d4a4
Remove SkipValidation from ModelConfig ( #30695 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-15 17:34:08 +00:00
Nicolò Lucchesi
185c22bf2f
[Misc][Hybrid allocator + kv connector] Optionally enable hybrid allocator + KV cache connector ( #29805 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-15 11:17:58 +00:00
wang.yuqi
4429d934de
[Model] Automatic conversion of TokenClassification model ( #30666 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-15 08:13:00 +00:00
Boyuan Feng
917fdae5b2
[Log] Skip piecewise cudagraph warn when using full cudagraph ( #30657 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-12-15 02:49:45 +00:00
yifant-code
5ccf0efa84
[Bugfix] Improve error messages in ModelConfig validation ( #30213 )
...
Signed-off-by: ytian218 <ytian218@bloomberg.net>
Co-authored-by: ytian218 <ytian218@bloomberg.net>
2025-12-14 21:23:37 +08:00
Jee Jee Li
35acd22a5d
Move forward
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-12 08:53:09 +00:00
Jee Jee Li
421707dec1
Merge branch 'main' into mlm-full-lora-support
2025-12-12 15:00:59 +08:00
Jee Jee Li
208dc0c954
Fix comments
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-12 00:05:07 +00:00
Nicolò Lucchesi
0efd9f867c
[Core] Whisper Enable Encoder Batching ( #29421 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-11 21:06:51 +00:00
Harry Mellor
cf3eacfe58
Standardise get_rope to use rope_parameters["partial_rotary_factor"], not rotary_dim ( #30389 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-11 20:45:23 +00:00
B-201
e10321bf6a
Merge branch 'main' into mlm-full-lora-support
2025-12-12 00:04:59 +08:00
bk-201
dd857e480f
Merge branch 'mlm-full-lora-support' of https://github.com/jeejeelee/vllm into mlm-full-lora-support
2025-12-11 16:02:37 +00:00
Qiu
a11f4a81e0
[Misc][PCP&DCP] relocate PCP feature check ( #30050 )
...
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-11 03:36:18 -08:00
wang.yuqi
a5f9fb5960
[Deprecation] Deprecation --convert reward, use --convert embed instead. ( #30463 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-11 10:18:25 +00:00
bk-201
27448490f1
update argument name
...
Signed-off-by: bk-201 <joy25810@foxmail.com>
2025-12-11 06:46:53 +00:00
Cyrus Leung
7e24e5d4d6
[Deprecation] Remove deprecated task, seed and MM settings ( #30397 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-10 19:59:39 -08:00
Cyrus Leung
5a87d8b9b1
[Deprecation] Remove deprecated plugin and compilation fields for v0.13 release ( #30396 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-10 19:59:35 -08:00
B-201
d1307e1d29
Merge branch 'main' into mlm-full-lora-support
2025-12-11 11:47:50 +08:00
Will Eaton
a9e4106f28
[P/D] KV Load Failure Recovery/Abort Configuration ( #26813 )
...
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <me@wseaton.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-10 11:00:52 -08:00
Nicolò Lucchesi
c756fb6781
[Core] Whisper enable FULL_DECODE_ONLY CudaGraph ( #30072 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-10 06:14:24 -08:00
bk-201
5ff0c6fb73
Merge remote-tracking branch 'origin/main' into mlm-full-lora-support
2025-12-10 07:10:58 +00:00
PatrykSaffer
4c2e10ea19
[Bugfix] Fix cuda graph sizes when running with speculative decoding ( #30330 )
...
Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com>
Signed-off-by: PatrykSaffer <patryk.saffer@mistral.ai>
Co-authored-by: Patryk Saffer <patryk.saffer99@gmail.com>
2025-12-10 00:47:07 +00:00
Benjamin Chislett
e858bfe051
[Cleanup] Refactor profiling env vars into a CLI config ( #29912 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-09 13:29:33 -05:00
Laith Sakka
87aee9ed2b
Add evaluate_guards option to DynamicShapesConfig ( #27432 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com>
2025-12-08 10:46:15 -05:00
wang.yuqi
9e77ffca3f
[Model][7/N] Improve all pooling task | Deprecation as_reward_model. Extract hidden states prefer using new multi-vector retrieval API ( #26686 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-08 08:10:09 +00:00
Isotr0py
b952f4d3c3
[v1] Add PrefixLM support to FlexAttention backend ( #27938 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-07 15:51:36 +00:00
Cyrus Leung
e83b7e379c
Revert "[Renderer] Separate out RendererConfig from ModelConfig ( #30145 )" ( #30199 )
2025-12-07 00:00:22 -08:00
Cyrus Leung
27f4c2fd46
[Renderer] Separate out RendererConfig from ModelConfig ( #30145 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-06 23:15:42 -08:00
Wentao Ye
17eb25e327
[Perf] Enable cuda graph for deepepHT, 5.3% throughput improvement, 4.4% TTFT improvement ( #29558 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-07 04:44:50 +00:00
Nick Hill
4026ae31e9
[Misc] Move disable_nccl_for_dp_synchronization init logic into VllmConfig ( #30161 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-12-05 20:59:04 -08:00
Rohan Potdar
40a046cd82
[Bugfix]: Fix TokenizerLike interface ( #30009 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
2025-12-05 20:56:40 -08:00
Harry Mellor
bf4a901af9
Better error when world size is larger than node and distributed_executor_backend is not set ( #30140 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-05 20:53:52 -08:00
Bangsheng Tang
77e4472809
let draft model follow target model's config_format ( #30152 )
2025-12-05 13:33:42 -08:00
Ilya Markov
4e26d3b09e
[Compile] Conditional compilation. Introduce compile_ranges ( #24252 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Luka Govedič <luka.govedic@gmail.com>
Signed-off-by: ProExpertProg <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Luka Govedič <luka.govedic@gmail.com>
2025-12-05 18:17:32 +00:00
Matthew Bonanni
66e674cdd5
[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments ( #26315 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
2025-12-05 09:48:43 -08:00
Alec S
2c174420f5
Reduce validation to a warning ( #28749 )
...
Signed-off-by: Alec Solder <alecs@fb.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Alec Solder <alecs@fb.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-05 14:02:49 +00:00
B-201
1fbd7287b8
Merge branch 'main' into mlm-full-lora-support
2025-12-05 20:17:40 +08:00
bk-201
113eb2e0b8
add a enable option
...
Signed-off-by: bk-201 <joy25810@foxmail.com>
2025-12-05 12:14:53 +00:00
Max Hu
c2894d3883
[Feature] Add Layer-wise NVTX Support ( #29990 )
...
Signed-off-by: Max Hu <hyoung2991@gmail.com>
Signed-off-by: Max Hu <maxhu@nvidia.com>
Co-authored-by: Max Hu <maxhu@nvidia.com>
2025-12-05 11:20:07 +00:00