Nicolò Lucchesi
e087fbc393
[MM] Pass FA version in ViT Attn ( #30756 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-17 07:54:45 +08:00
TJian
2410132bb1
[ROCm] [Bugfix] Fix torch sdpa hallucination ( #30789 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-12-16 15:32:43 -08:00
Jinzhen Lin
ce96857fdd
[Kernel][Quantization][MoE] add marlin kernel support for turing (sm75) ( #29901 )
...
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-12-16 14:35:28 -08:00
Roger Wang
f5f51e5931
[Core][MM] Optimize encoder cache manager by operating with embeddings only ( #30475 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Sun Kim <sunytokki@gmail.com>
2025-12-16 14:18:17 -08:00
Lucas Wilkinson
9fec0e13d5
[Attention] Cache attention metadata builds across hybrid KV-cache groups ( #29627 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Stanislaw Wozniak <stw@zurich.ibm.com>
2025-12-16 17:10:16 -05:00
jiahanc
254a7f8fd6
[Perf] Do FP4 quant before All gather on flashinfer trtllmgen MOE ( #30014 )
...
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
2025-12-16 13:01:48 -08:00
Nicolò Lucchesi
ca702a14dc
[Frontend] Add max-completion-token option to transcription/translation endpoints ( #30769 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-16 19:36:49 +00:00
Michael Goin
10ee1c64cf
[CI] Generalize gsm8k test args and add Qwen3-Next MTP B200 test ( #30723 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-16 14:28:34 -05:00
Mark McLoughlin
66c3537e5d
[Docs][API] Remove warning about LoRARequest being internal-only ( #30774 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-12-16 08:35:46 -08:00
Harry Mellor
e1625498f4
Update where bytes_to_unicode is imported from ( #30771 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-16 08:05:01 -08:00
Harry Mellor
0b0acc758e
Remove head_mask from Ultravox and Swin ( #30764 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-16 08:02:41 -08:00
Ming Yang
ce12b407f2
[TRTLLM] Remove the MoE GEMM weight name change ( #30713 )
...
Signed-off-by: Ming Yang <minos.future@gmail.com>
2025-12-16 11:01:38 -05:00
Wentao Ye
59bd5f6a71
[Feat] Enable eplb with default all2all backend ( #30559 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-16 10:33:52 -05:00
Lucas Wilkinson
00a8d7628c
[BugFix] Fix memory spike in workspace allocation ( #30744 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-16 06:46:22 -08:00
Nicolò Lucchesi
75eb302a2e
[Bugfix] Whisper fix number of allocated CrossAttn blocks per-request ( #30772 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-16 14:20:19 +00:00
Pleaplusone
9dbbc59b15
[ROCm][MTP] Support MTP for AITER MLA backend ( #28624 )
...
Signed-off-by: ganyi <ygan@amd.com>
2025-12-16 14:10:26 +00:00
Boyuan Feng
104003dc77
update piecewise cudagraph warning when splitting_ops=[] ( #30728 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-12-16 06:09:34 -08:00
TJian
d0fb572929
[ROCm] [AITER] [DOC] Add usage description about check functions in _aiter_ops ( #30586 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-12-16 13:50:47 +00:00
Harry Mellor
6f15ac5de7
Don'e assume position_embedding_type will be present for BERT and RoBERTa models ( #30770 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-16 13:40:26 +00:00
Junru Shen
676db55eec
[Bugfix] Fix prefix_repetition routing in bench throughput ( #29663 )
...
Signed-off-by: Junru Shen <jrshen.sjr@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-16 01:37:15 -08:00
Jee Jee Li
0e391e7570
[Bugfix] Fix RequestOutput miss lora_request ( #30636 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-16 01:36:35 -08:00
Andrew Xia
0d0c929f23
[responsesAPI][8] input/output messages for ResponsesParser ( #30158 )
...
Signed-off-by: Andrew Xia <axia@fb.com>
Signed-off-by: Andrew Xia <axia@meta.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-12-16 13:54:59 +08:00
Isotr0py
e94384bbad
[Bugfix] Fix broken ViT attention selection for Blackwell device ( #30731 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-16 05:24:32 +00:00
jiangkuaixue123
b9ff4f2a8d
[feature] extend DBO to XBO ( #30120 )
...
Signed-off-by: jiangkuaixue123 <jiangxiaozhou111@163.com>
Co-authored-by: root <root@hk01dgx028.cm.cluster>
2025-12-16 00:04:01 -05:00
Shanshan Shen
3bd9c49158
[CustomOp] Extract ApplyRotaryEmb as CustomOp and unify the dispatch logic ( #29873 )
...
Signed-off-by: shen-shanshan <467638484@qq.com>
Co-authored-by: gcanlin <canlinguosdu@gmail.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2025-12-15 19:08:16 -08:00
penfree
bbd850e597
[Bugfix] fix streaming final output for non harmony ( #30237 )
...
Signed-off-by: penfree <qiupengfei@baidu.com>
Co-authored-by: penfree <qiupengfei@baidu.com>
2025-12-16 09:03:11 +08:00
Matthew Bonanni
a182be4308
[UX][Attention] Add attention_config argument to LLM() ( #30710 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-12-15 17:29:09 -05:00
Matthew Bonanni
60dbf7d8f1
Update batch invariant to use attention config ( #30704 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-15 15:24:16 -05:00
Michael Goin
a450c64a30
[Bugfix] Fail instead of ignoring when CompilationConfig gets invalid args ( #30708 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-15 20:18:02 +00:00
Matthew Bonanni
51e5b3e3c4
[Bugfix] Fix ViT with FlashAttention on ROCm ( #30703 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-12-15 19:45:21 +00:00
Isotr0py
ec154c36ee
[Platform] Refactor Platform attention backend selection to avoid breakpoint for OOT platform ( #30212 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-15 17:36:07 +00:00
Harry Mellor
970713d4a4
Remove SkipValidation from ModelConfig ( #30695 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-15 17:34:08 +00:00
mondaylord
17fec3af09
[Bugfix] Fix missing first token in tool calls during reasoning-to-tool transition ( #30671 )
...
Signed-off-by: mondaylord <20212010046@fudan.edu.cn>
2025-12-15 16:13:37 +00:00
yjc9696
855b101d75
[Frontend] add tools for dsv32 developer role ( #30040 )
...
Signed-off-by: pridejcyang <pridejcyang@tencent.com>
Co-authored-by: pridejcyang <pridejcyang@tencent.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-15 15:08:47 +00:00
Robert Shaw
d0502b4928
[MoE][Refactor 1/N] Separate Online Quantization ( #30627 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2025-12-15 06:54:53 -08:00
Max Hu
3f175f18a2
[Bugfix] Fix multimodal configuration for Qwen3VL MOE model ( #30670 )
...
Signed-off-by: Max Hu <hyoung2991@gmail.com>
2025-12-15 14:06:01 +00:00
Chauncey
2a1776b7ac
[Refactor] [2/N] Move tool parsers into the vLLM main directory ( #30675 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-15 12:54:52 +00:00
Nicolò Lucchesi
185c22bf2f
[Misc][Hybrid allocator + kv connector] Optionally enable hybrid allocator + KV cache connector ( #29805 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-15 11:17:58 +00:00
duke
e4806d973a
[BugFix] Add embed_input_ids method to make QWenLMHeadModel a vllm model ( #30674 )
...
Signed-off-by: root <iwzbi@zju.edu.cn>
Co-authored-by: root <iwzbi@zju.edu.cn>
2025-12-15 10:38:29 +00:00
wang.yuqi
4429d934de
[Model] Automatic conversion of TokenClassification model ( #30666 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-15 08:13:00 +00:00
ゆり
33278073d6
typing: Add type hints to TurnMetrics class in context.py ( #30552 )
...
Co-authored-by: zkexorability <zkexorability@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-14 23:00:39 -08:00
汪志鹏
1adeb3b84c
[New Model] BAGEL support (AR only) ( #28439 )
...
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-15 14:58:23 +08:00
Wentao Ye
3778673ea8
[Feat] Refactor for parallel_config in FusedMoEModularKernel ( #30282 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-12-15 04:21:36 +00:00
Seokhyun An
b337647aa0
[Bugfix] Drop empty tool_calls lists to keep assistant replies in chat template ( #30648 )
...
Signed-off-by: Seokhyun An <iamseokhyun@gmail.com>
2025-12-15 04:21:12 +00:00
Jee Jee Li
a524d1ba0a
[Bugfix] Fix deepseek_v32 tokenizer_mode ( #30658 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-15 04:20:31 +00:00
Shanshan Shen
87b4d1557d
[CustomOp][MM] Extract MMEncoderAttention as CustomOp and replace the backend of QwenVisionAttention with it. ( #30125 )
...
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-12-15 11:13:32 +08:00
Wenqi Glantz
84e23d103d
additional protection for CVE-2025-62164 ( #30649 )
...
Signed-off-by: Wenqi Glantz <wglantz@nvidia.com>
2025-12-15 03:07:10 +00:00
Shanshan Shen
738648fb81
[CustomOp] Support object-level enable for CustomOp ( #30547 )
...
Signed-off-by: shen-shanshan <467638484@qq.com>
2025-12-15 11:02:09 +08:00
Boyuan Feng
917fdae5b2
[Log] Skip piecewise cudagraph warn when using full cudagraph ( #30657 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-12-15 02:49:45 +00:00
Robert Shaw
e2ed238885
Revert "[Fix]Load kv-cache dtype from hf_quant_config.json automatically" ( #30653 )
2025-12-14 19:33:41 -05:00