gh-wf
36c9ce2554
Ensure minimum frames for GLM 4.6V compatibility ( #30285 )
...
Signed-off-by: Wayne Ferguson <wayneferguson@gmail.com>
2025-12-11 05:26:49 +00:00
Wentao Ye
d6464f2679
[Chore] Fix torch precision warning ( #30428 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-11 04:05:56 +00:00
Cyrus Leung
7e24e5d4d6
[Deprecation] Remove deprecated task, seed and MM settings ( #30397 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-10 19:59:39 -08:00
Cyrus Leung
5a87d8b9b1
[Deprecation] Remove deprecated plugin and compilation fields for v0.13 release ( #30396 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-10 19:59:35 -08:00
Divakar Verma
d1e1fb4363
[Bugfix] Fix grouped_topk pytorch impl when num_experts can't be grouped properly ( #29439 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2025-12-10 19:47:18 -08:00
Andreas Karatzas
b51255f369
[ROCm] Fix broken import in platform attention backend dispatching ( #30432 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-11 01:12:58 +00:00
Sage Moore
b4054c8ab4
Revert "[CI] Add Async Eplb nightly CI tests ( #29385 )" ( #30431 )
2025-12-11 00:48:35 +00:00
shivampr
8580919ac3
[Bugfix] fix confusing OOM errors during v1 init ( #28051 )
...
Signed-off-by: Shivam <shivamprasad91@gmail.com>
Signed-off-by: shivampr <shivampr.dev@gmail.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
2025-12-10 23:17:41 +00:00
Christina Norman
166ac3c94d
fix(shm): Add memory barriers for cross-process shared memory visibility ( #30407 )
...
Signed-off-by: Christina Holland <hey@christinaholland.com>
Signed-off-by: Christina <truffle@gmail.com>
2025-12-10 23:01:19 +00:00
Nick Hill
6ccb7baeb1
[LMCache] Fix breakage due to new LMCache version ( #30216 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-12-10 11:52:01 -08:00
Po-Han Huang (NVIDIA)
eea41804a4
[bug] Fix "Current vLLM config is not set." warnings when FlashInfer attention is used ( #30241 )
...
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
2025-12-10 11:18:51 -08:00
Jialin Ouyang
9f042ba26b
[Perf] Enable environment cache in EngineCore to enable the feature for UniProcExecutor as well ( #29289 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-12-10 14:13:01 -05:00
Cyrus Leung
e72d65b959
{Deprecation] Remove tokenizer setter ( #30400 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-10 19:10:58 +00:00
Will Eaton
a9e4106f28
[P/D] KV Load Failure Recovery/Abort Configuration ( #26813 )
...
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <me@wseaton.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-10 11:00:52 -08:00
Anker
e8e8cd73e5
[Bugfix] Fix HunyuanOCR cross-image contamination in batch processing ( #30344 )
...
Signed-off-by: Lennart Brog <lennart.borg@list-ag.de>
Signed-off-by: Anker <20343812+anker-c2@users.noreply.github.com>
2025-12-10 18:09:31 +00:00
Cyrus Leung
253305d5b2
[Chore] Delay recent deprecations ( #30398 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-10 17:48:38 +00:00
Matthew Bonanni
794a7875ee
[Misc] Consistent case for vllm bench serve results ( #30403 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-12-10 09:44:02 -08:00
Lucas Wilkinson
aacf0abf8b
[BugFix] Fix AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight_scale' ( #30399 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-12-10 07:59:23 -08:00
Nicolò Lucchesi
c756fb6781
[Core] Whisper enable FULL_DECODE_ONLY CudaGraph ( #30072 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-10 06:14:24 -08:00
Roger Young
d017bceb08
[BugFix] Fix minimax m2 model rotary_dim ( #30384 )
...
Signed-off-by: xuebi <xuebi@minimaxi.com>
Co-authored-by: xuebi <xuebi@minimaxi.com>
2025-12-10 04:58:50 -08:00
Aditya Tewari
cebda2a4af
[CPU] Support for Whisper ( #30062 )
...
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>
2025-12-10 04:58:42 -08:00
Daniele
53d2420b44
[Bugfix] tpu_model_runner: set vllm config context when calling reset_dynamo_cache() ( #30331 )
...
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
2025-12-10 04:58:35 -08:00
Chauncey
9db78f34dc
[Bugfix] Fix the issue where DeepSeek v3.2 cannot use structured_output ( #30371 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-10 08:30:16 +00:00
Mingliang Li
d007387aa7
[Bugfix] Cache added_vocab to avoid per-token overhead ( #30351 )
...
Signed-off-by: limingliang <limingliang@stepfun.com>
Co-authored-by: limingliang <limingliang@stepfun.com>
2025-12-10 12:05:51 +08:00
Wilson Wu
3bdd426636
Fix typos in comments across multiple files ( #30345 )
...
Signed-off-by: Wilson Wu <iwilsonwu@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-12-09 20:05:28 -08:00
haoyangli-amd
06462392e4
[bugfix][quantization] fix quark qwen3 kv_cache quantization ( #30308 )
...
Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com>
2025-12-10 03:24:12 +00:00
Andrew Xia
c3487aca34
[responsesAPI][6] Fix multi turn MCP tokenization ( #30230 )
...
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
2025-12-10 10:13:13 +08:00
Lucas Wilkinson
abe93bce59
[Attention] Make seq_lens_cpu optional in CommonAttentionMetadata to enable true async spec-decode ( #29624 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
2025-12-09 17:18:10 -08:00
ElizaWszola
2e7035dd8c
[Bugfix] Fix fp8 DeepGemm compilation issues ( #30336 )
2025-12-09 20:17:25 -05:00
PatrykSaffer
4c2e10ea19
[Bugfix] Fix cuda graph sizes when running with speculative decoding ( #30330 )
...
Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com>
Signed-off-by: PatrykSaffer <patryk.saffer@mistral.ai>
Co-authored-by: Patryk Saffer <patryk.saffer99@gmail.com>
2025-12-10 00:47:07 +00:00
dongbo910220
03b5f940fd
[V1][Spec Decode] Optimize Medusa proposer to avoid GPU-CPU sync ( #29723 )
...
Signed-off-by: dongbo910220 <1275604947@qq.com>
2025-12-10 00:15:01 +00:00
Charlie Fu
3c680f4a17
[Rocm][torch.compile] Adding layernorm + fp8 block quant and silu + fp8 block quant for Aiter ( #25693 )
...
Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Signed-off-by: Charlie Fu <Charlie.Fu@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: wuhuikx <hattie.wu@amd.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
2025-12-09 22:39:26 +00:00
Kyle Sayers
fccd532587
[Quantization] FP8 Weight Reloading for Quantized RL Rollout ( #28480 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
2025-12-09 13:54:32 -08:00
bnellnm
00e5cbb967
[MoE][Refactor] Remove most arguments to FusedMoEMethodBase.apply ( #29066 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
2025-12-09 13:48:25 -08:00
rasmith
7618dc973d
[CI/Build] Make test_mha_attn.py run on correct platform only and check for flash_attn_varlen_func in layer.py ( #29145 )
2025-12-09 20:18:17 +00:00
Tsukasa OI
73a484caa1
[Model][Quantization] Fix / Add GGUF support for Qwen2 MoE models ( #30307 )
...
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
2025-12-09 19:13:10 +00:00
Lucas Wilkinson
95501a70ec
[BugFix] Fix DeepSeek-R1 hang with DP and MTP ( #30119 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
2025-12-09 18:51:19 +00:00
Benjamin Chislett
e858bfe051
[Cleanup] Refactor profiling env vars into a CLI config ( #29912 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-09 13:29:33 -05:00
Woosuk Kwon
d471b2aff0
[Model Runner V2] Support num NaNs in logits ( #30187 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-12-09 10:00:49 -08:00
Woosuk Kwon
9e6562a3f6
[Model Runner V2] Fix Triton warning on tl.where ( #30355 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-12-09 09:59:54 -08:00
Ilya Markov
0b6a8a304c
[BugFix] Fix non detected failing tests ( #30277 )
...
Signed-off-by: ilmarkov <markovilya197@gmail.com>
2025-12-09 17:57:55 +00:00
Wentao Ye
83319b44c2
[Compile] Fix torch warning TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled ( #29897 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-09 10:40:37 -05:00
Lucas Wilkinson
56037dfa2f
[BugFix] Fix assert batch_descriptor.num_tokens == num_tokens_padded ( #30173 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-12-09 10:36:12 -05:00
quanliu
5dcd593baf
[Feature] Batch-Invariant Support for FA2 and LoRA ( #30018 )
...
Signed-off-by: quanliu <18646313696@163.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-12-09 10:01:38 -05:00
Julien Denize
5c213d2899
[BUGFIX] Mistral tool call parser v11+ ( #30332 )
...
Signed-off-by: juliendenize <julien.denize@mistral.ai>
2025-12-09 14:55:38 +00:00
vllmellm
ee14644ba9
[ROCm] Aiter Quant Kernels ( #25552 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-12-09 14:27:37 +00:00
Dongjie Zou
1166c31cc7
[Bugfix]: Fix glm46 awq marlin moe wna16 compatibility ( #30210 )
...
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
2025-12-09 12:20:21 +00:00
haoyangli-amd
03416eada6
[bugfix][quantization] Fix fp8 per_tensor scale shape ( #30257 )
...
Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com>
2025-12-09 19:28:50 +08:00
Hubert de La Jonquiere
c72ea10723
[Structured Output][Reasoning] Improves decoding throughput for models using single-token reasoning endings. ( #30056 )
2025-12-09 18:54:08 +08:00
Jaya Yuan
67475a6e81
[DCP][Bugfix][CI] Fix accuracy issue of DCP when using FLASH_ATTN_MLA ( #30309 )
...
Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com>
2025-12-09 08:22:14 +00:00