Wentao Ye
76e6a95192
[Bug] Fix Number of dimensions of tensors must match. for Deepseek V3.2 ( #31160 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-24 10:41:09 +08:00
Chao Lei
8b59753cdb
[P/D] Mooncake connector support more protocols ( #30133 )
...
Signed-off-by: LCAIZJ <leichao139636@163.com>
2025-12-24 10:24:07 +08:00
Chen Zhang
538e830caa
[KVEvent] User request.block_hash for parent block_hash ( #30544 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Co-authored-by: Yifan Qiao <yifanqiao@berkeley.edu>
2025-12-23 18:23:43 -08:00
rongfu.leng
4ed11105d7
[Misc] Remove unused custom ops copy_blocks and copy_blocks_mla ( #30967 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2025-12-23 18:22:35 -08:00
Cyrus Leung
dd424571c8
[Bugfix] Enable dynamic_dims for different embeds shape ( #31223 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-24 10:15:47 +08:00
Cyrus Leung
ca6a95ba25
[Chore] Simplify logic of _execute_mm_encoder ( #31222 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-23 18:15:16 -08:00
Vadim Gimpelson
bc0a5a0c08
[CI] Add Qwen3-Next-FP8 to Blackwell model tests ( #31049 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
2025-12-23 17:21:50 -08:00
Andreas Karatzas
bfa2c0bbb9
[ROCm][Bugfix] Fix RuntimeError in MMEncoderAttention by replacing .view() with .reshape() ( #31203 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-23 21:48:01 +00:00
Mark McLoughlin
f790068600
[Core] Add a random suffix to frontend-provided request IDs ( #27987 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-12-23 13:05:39 -08:00
Asaf Joseph Gardin
34916ae37f
[Mamba] - Consolidate Mambas Attention Logic ( #28133 )
2025-12-23 21:57:00 +01:00
Yuan Tang
0736f901e7
docs: Add llm-d integration to the website ( #31234 )
...
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-12-23 20:27:22 +00:00
Harry Mellor
c016c95b45
Use helper function instead of looping through attribute names ( #29788 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-23 17:31:56 +00:00
Harry Mellor
1339878e13
Only patch original_max_position_embeddings for Transformers v4 ( #31214 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-23 16:46:32 +00:00
danielafrimi
b94f80ffb8
[FIX] FP4 quantization kernel padding initialization bug ( #31097 )
...
Signed-off-by: <>
Co-authored-by: root <root@gpu-193.slurm-workers-slurm.slurm.svc.cluster.local>
Co-authored-by: root <root@gpu-951.slurm-workers-slurm.slurm.svc.cluster.local>
2025-12-23 08:45:18 -08:00
Joachim Studnia
38c361f99d
Fix edge case Mistral tool parser ( #30724 )
...
Signed-off-by: Joachim Studnia <joachim@mistral.ai>
Signed-off-by: Joachim Studnia <studniajoachim@gmail.com>
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: juliendenize <julien.denize@mistral.ai>
Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2025-12-23 14:19:58 +00:00
Cyrus Leung
bb62dda2c3
[Misc] Introduce encode_*_url utility function ( #31208 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-23 13:45:21 +00:00
Patrick von Platen
3faa8bee57
adapt voxtral ( #31095 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
2025-12-23 05:31:55 -08:00
Harry Mellor
b10d47e0e0
Add util function for checking nesting of rope parameters ( #31146 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-23 11:41:49 +00:00
R3hankhan
769f27e701
[OpenAI] Add parameter metadata to validation errors ( #30134 )
...
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
2025-12-23 11:30:12 +00:00
Jakub Zakrzewski
23daef548d
[Frontend] Support using chat template as custom score template for reranking models ( #30550 )
...
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-23 11:19:16 +00:00
Jee Jee Li
27c6c2f98c
[Bugfix] Fix MoE LoRA bin/pt loading ( #31161 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-23 19:09:15 +08:00
Weida Hong
73cfb7a722
Correct position of docstring of class attributes ( #31209 )
...
Signed-off-by: Weida Hong <wdhongtw@google.com>
2025-12-23 02:08:58 -08:00
vllmellm
f32cfd7d97
[ROCm][FEAT] Support AITER RMSNorm quantization fusion pass ( #26575 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2025-12-23 02:07:54 -08:00
Jee Jee Li
6b16fff01b
[Bugfix] Fix Jais2ForCausalLM ( #31198 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-23 07:44:01 +00:00
Yan Ma
f1c2c20136
[XPU] decrease IGC_ForceOCLSIMDWidth for speculative decoding triton-xpu kernel compilation ( #30538 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com>
2025-12-23 05:22:15 +00:00
Cyrus Leung
8cef137689
[Chore] Update more locations to use attention_config.backend ( #31153 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-22 19:19:50 -08:00
quanliu
a37328fc5c
[Feature] Batch invariant: Lora ( #30097 )
...
Signed-off-by: quanliu <18646313696@163.com>
2025-12-23 10:32:47 +08:00
Pavani Majety
3e10262356
Revert "[SM100] Enable fp8 compute for prefill MLA ( #30746 )" ( #31197 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
2025-12-22 18:15:33 -08:00
Angela Yi
612d5ffdab
[ci] Fix Pytorch compilation test oom in 2.10 ( #31194 )
...
Signed-off-by: angelayi <yiangela7@gmail.com>
2025-12-23 01:56:47 +00:00
Divakar Verma
78e5e62bbf
[AMD][CI] fix v1/engine test_preprocess_error_handling ( #31192 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2025-12-23 01:28:19 +00:00
Robert Shaw
b57b967386
[MoE Refactor][7/N] AITER MK ( #31102 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2025-12-22 16:42:58 -07:00
Michael Goin
6d518ffbaa
[CI Failure] Disable mosaicml/mpt-7b and databricks/dbrx-instruct tests ( #31182 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-22 15:40:35 -08:00
Benjamin Chislett
85aff45e24
[Perf] Remove blocking copy in GDN Attention ( #31167 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
2025-12-22 14:25:22 -08:00
Wentao Ye
5312a7284e
[Bug] Fix 'CutlassMLAImpl' object has no attribute '_workspace_buffer' ( #31173 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-22 14:24:27 -08:00
Lucas Wilkinson
de71747655
[SpecDecode] Simplified alternative padded-speculation acceptance rate fix ( #29845 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-12-22 13:06:10 -08:00
Michael Goin
9586354053
[Doc] Add vllm-metal to hardware plugin documentation ( #31174 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-22 20:06:29 +00:00
Pavani Majety
b10f41c894
[SM100] Enable fp8 compute for prefill MLA ( #30746 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
2025-12-22 19:15:57 +00:00
Yongye Zhu
7b926e8901
[MoE Refactor][9/N] Use modular kernel for unquantized Triton MoE ( #31052 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
2025-12-22 17:34:19 +00:00
Gregory Shtrasberg
ab3a85fd68
[ROCm][CI/Build] Fix triton version to one that has triton_kernels required for gpt-oss to run ( #31159 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-12-22 17:19:27 +00:00
Boyuan Feng
8dd0db687b
[UX] improve profiler error message ( #31125 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-12-22 08:45:59 -08:00
TJian
022f3cea53
[ROCm] [Critical]: Remove unused variable ( #31156 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-12-22 08:28:22 -08:00
Micah Williamson
a5bc77c253
[AMD][CI] Add "V1 Test e2e + engine" to mi325_8 Agent Pool ( #31040 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-12-22 10:41:56 -05:00
Nicolò Lucchesi
b1c3f96ae3
[CI][Bugfix] Fix entrypoints/openai/test_audio.py ( #31151 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-22 07:21:40 -08:00
dengyunyang
8f8f469b1b
[BugFix] skip language model in Encoder ( #30242 )
...
Signed-off-by: dengyunyang <584797741@qq.com>
2025-12-22 05:25:59 -08:00
Shengqi Chen
2cf91c2ea4
[CI] add polling for precompiled wheel in python_only_compile.sh, fix index generation for releases ( #30781 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
2025-12-22 13:24:21 +00:00
AlonKejzman
bd6d5a7475
[gpt-oss] Fix harmony parser in streaming responses ( #30205 )
...
Signed-off-by: AlonKejzman <alonkeizman@gmail.com>
2025-12-22 20:56:06 +08:00
Li Wang
256a33ecb4
[Model] Fix bagel failed to run ( #31132 )
...
Signed-off-by: wangli <wangli858794774@gmail.com>
2025-12-22 02:15:54 -08:00
Roger Young
c02a2705f9
Update MiniMax-M2 ToolCall and add MiniMax-M2.1 in Docs ( #31083 )
...
Signed-off-by: xuebi <xuebi@minimaxi.com>
Co-authored-by: xuebi <xuebi@minimaxi.com>
2025-12-22 05:28:40 +00:00
Kevin McKay
cf8eed7bef
[Bugfix][ROCm] Fix typo: is_linear_fp8_enaled -> is_linear_fp8_enabled ( #31109 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2025-12-21 21:14:58 -08:00
Kevin McKay
44ae85f725
[Misc] Fix typo: 'occured' -> 'occurred' ( #31120 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
2025-12-21 21:14:27 -08:00