Fadi Arafeh
f355ad5412
[CPU][FIX] Fix build failures on Arm CPUs with torch nightly ( #30481 )
...
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2025-12-12 02:09:25 +00:00
Lucas Wilkinson
042da73244
[Core] Refactor _build_attention_metadata ( #29628 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-12-11 17:54:12 -08:00
jiahanc
0ab23c2b2b
[fix] fix SM check for Flashinfer TRTLLM MOE ( #30314 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-12-12 01:00:58 +00:00
Andrew Briand
a00d88973d
[EPLB] Support EPLB w/ NVFP4 ( #29804 )
...
Signed-off-by: Andrew Briand <abriand@nvidia.com>
Co-authored-by: Andrew Briand <abriand@nvidia.com>
2025-12-11 22:59:40 +00:00
Wentao Ye
c817b14151
[Perf] Optimize deepgemm experts initialization, 3.9% TTFT improvement ( #30494 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: li-jinpeng <3332126450@qq.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-12-11 17:28:34 -05:00
Nicolò Lucchesi
0efd9f867c
[Core] Whisper Enable Encoder Batching ( #29421 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-11 21:06:51 +00:00
Xingyu Liu
90d6cf921f
[BugFix][MM]support VLLM_RANDOMIZE_DP_DUMMY_INPUTS ( #30472 )
...
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-11 21:00:15 +00:00
Harry Mellor
cf3eacfe58
Standardise get_rope to use rope_parameters["partial_rotary_factor"], not rotary_dim ( #30389 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-11 20:45:23 +00:00
Zhengxu Chen
92fea56fd1
[compile] Stop one-off setting enable_aot_compile and use context manager instead. ( #30503 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
2025-12-11 20:28:03 +00:00
Andreas Karatzas
72aaac5b66
[ROCm][Bugfix] Add MLACommonMetadata to allowed attention types for speculative decoding ( #30430 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-11 19:25:01 +00:00
汪志鹏
0e71eaa644
[Feature] AWQ marlin quantization support for fused moe with lora ( #30442 )
...
Signed-off-by: princepride <wangzhipeng628@gmail.com>
2025-12-11 18:03:32 +00:00
Harry Mellor
8781cd6b88
Add Eagle and Eagle3 support to Transformers modeling backend ( #30340 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-11 17:02:10 +00:00
Julien Denize
aa3c250c48
[IMPROVEMENT] Change MistralReasoningParser behavior ( #30391 )
...
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2025-12-11 17:53:26 +01:00
Harry Mellor
93db3256a4
Give pooling examples better names ( #30488 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-11 16:22:58 +00:00
Harry Mellor
97a042f3bc
Make the httpx logger less annoying when Transformers v5 is installed ( #30480 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-11 15:44:56 +00:00
Cyrus Leung
3a3b06ee70
[Misc] Improve error message for is_multimodal ( #30483 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-11 06:39:51 -08:00
Martin Hickey
f4417f8449
[KVConnector] Add KV events to KV Connectors ( #28309 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
2025-12-11 15:30:29 +01:00
Qiu
a11f4a81e0
[Misc][PCP&DCP] relocate PCP feature check ( #30050 )
...
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-11 03:36:18 -08:00
Kenichi Maehashi
853611bb18
Fix typo of endpoint name in CLI args docs ( #30473 )
...
Signed-off-by: Kenichi Maehashi <maehashi@preferred.jp>
2025-12-11 11:07:56 +00:00
wang.yuqi
a5f9fb5960
[Deprecation] Deprecation --convert reward, use --convert embed instead. ( #30463 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-11 10:18:25 +00:00
jeremyteboul
4515eb1a0b
[Fix] Update lazing loading of video loader backend ( #30444 )
...
Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com>
Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>
2025-12-11 10:14:57 +00:00
Cyrus Leung
13d63b65e0
[Deprecation] Remove missed fallback for embed_input_ids ( #30469 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-11 10:06:36 +00:00
Rei.
6299628d32
[bugfix] fix MiniMaxM2ReasoningParser streaming output not separating reasoning_content. ( #29882 )
...
Signed-off-by: Rei <1477174254@qq.com>
2025-12-11 09:05:08 +00:00
Ming Yang
fba8906930
[perf] Use direct copy (broadcast) instead of cat for k_nope/k_pe in MLA prefill ( #29710 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com>
2025-12-11 08:20:45 +00:00
Ning Xie
d02d1043de
fix: enhance human_readable_int function ( #30337 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2025-12-10 23:30:33 -08:00
Cyrus Leung
979f50efd0
[Deprecation] Remove fallbacks for embed_input_ids and embed_multimodal ( #30458 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-11 06:58:23 +00:00
gh-wf
36c9ce2554
Ensure minimum frames for GLM 4.6V compatibility ( #30285 )
...
Signed-off-by: Wayne Ferguson <wayneferguson@gmail.com>
2025-12-11 05:26:49 +00:00
Wentao Ye
d6464f2679
[Chore] Fix torch precision warning ( #30428 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-11 04:05:56 +00:00
Cyrus Leung
7e24e5d4d6
[Deprecation] Remove deprecated task, seed and MM settings ( #30397 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-10 19:59:39 -08:00
Cyrus Leung
5a87d8b9b1
[Deprecation] Remove deprecated plugin and compilation fields for v0.13 release ( #30396 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-10 19:59:35 -08:00
Divakar Verma
d1e1fb4363
[Bugfix] Fix grouped_topk pytorch impl when num_experts can't be grouped properly ( #29439 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2025-12-10 19:47:18 -08:00
Andreas Karatzas
b51255f369
[ROCm] Fix broken import in platform attention backend dispatching ( #30432 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-11 01:12:58 +00:00
Sage Moore
b4054c8ab4
Revert "[CI] Add Async Eplb nightly CI tests ( #29385 )" ( #30431 )
2025-12-11 00:48:35 +00:00
shivampr
8580919ac3
[Bugfix] fix confusing OOM errors during v1 init ( #28051 )
...
Signed-off-by: Shivam <shivamprasad91@gmail.com>
Signed-off-by: shivampr <shivampr.dev@gmail.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
2025-12-10 23:17:41 +00:00
Christina Norman
166ac3c94d
fix(shm): Add memory barriers for cross-process shared memory visibility ( #30407 )
...
Signed-off-by: Christina Holland <hey@christinaholland.com>
Signed-off-by: Christina <truffle@gmail.com>
2025-12-10 23:01:19 +00:00
Nick Hill
6ccb7baeb1
[LMCache] Fix breakage due to new LMCache version ( #30216 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-12-10 11:52:01 -08:00
Po-Han Huang (NVIDIA)
eea41804a4
[bug] Fix "Current vLLM config is not set." warnings when FlashInfer attention is used ( #30241 )
...
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
2025-12-10 11:18:51 -08:00
Jialin Ouyang
9f042ba26b
[Perf] Enable environment cache in EngineCore to enable the feature for UniProcExecutor as well ( #29289 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-12-10 14:13:01 -05:00
Cyrus Leung
e72d65b959
{Deprecation] Remove tokenizer setter ( #30400 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-10 19:10:58 +00:00
Will Eaton
a9e4106f28
[P/D] KV Load Failure Recovery/Abort Configuration ( #26813 )
...
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <me@wseaton.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-10 11:00:52 -08:00
Anker
e8e8cd73e5
[Bugfix] Fix HunyuanOCR cross-image contamination in batch processing ( #30344 )
...
Signed-off-by: Lennart Brog <lennart.borg@list-ag.de>
Signed-off-by: Anker <20343812+anker-c2@users.noreply.github.com>
2025-12-10 18:09:31 +00:00
Cyrus Leung
253305d5b2
[Chore] Delay recent deprecations ( #30398 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-10 17:48:38 +00:00
Matthew Bonanni
794a7875ee
[Misc] Consistent case for vllm bench serve results ( #30403 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-12-10 09:44:02 -08:00
Lucas Wilkinson
aacf0abf8b
[BugFix] Fix AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight_scale' ( #30399 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-12-10 07:59:23 -08:00
Nicolò Lucchesi
c756fb6781
[Core] Whisper enable FULL_DECODE_ONLY CudaGraph ( #30072 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-10 06:14:24 -08:00
Roger Young
d017bceb08
[BugFix] Fix minimax m2 model rotary_dim ( #30384 )
...
Signed-off-by: xuebi <xuebi@minimaxi.com>
Co-authored-by: xuebi <xuebi@minimaxi.com>
2025-12-10 04:58:50 -08:00
Aditya Tewari
cebda2a4af
[CPU] Support for Whisper ( #30062 )
...
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>
2025-12-10 04:58:42 -08:00
Daniele
53d2420b44
[Bugfix] tpu_model_runner: set vllm config context when calling reset_dynamo_cache() ( #30331 )
...
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
2025-12-10 04:58:35 -08:00
Chauncey
9db78f34dc
[Bugfix] Fix the issue where DeepSeek v3.2 cannot use structured_output ( #30371 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-10 08:30:16 +00:00
Mingliang Li
d007387aa7
[Bugfix] Cache added_vocab to avoid per-token overhead ( #30351 )
...
Signed-off-by: limingliang <limingliang@stepfun.com>
Co-authored-by: limingliang <limingliang@stepfun.com>
2025-12-10 12:05:51 +08:00