Isotr0py
700a5ad6c6
[MM Encoder]: Migrate legacy ViT MultiHeadAttention to new MMEncoderAttention interface ( #30684 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-19 02:04:19 +08:00
inkcherry
500f26e6d3
[Bugfix] fix DP-aware routing in OpenAI API requests ( #29002 )
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-12-18 09:50:42 -08:00
sarathc-cerebras
28d15ab56b
adds jais 2 support ( #30188 )
...
Signed-off-by: sarathc-cerebras <sarath.chandran@cerebras.net>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-12-18 15:46:58 +00:00
Lucas Wilkinson
30bb19a760
[BugFix] Partial revert of #29558 (DeepEP HT + PIECEWISE CG support) ( #30910 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-17 23:50:15 -08:00
Yifan Qiao
11a89cf95c
[Fix][FlexAttention] return max logical block index to handle reused blocks ( #30915 )
...
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
2025-12-18 06:42:21 +00:00
Li, Jiang
e3ab93c896
[CPU] Refactor CPU fused MOE ( #30531 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-12-18 14:36:49 +08:00
Zhengxu Chen
5f2f3fba1d
[compile] Fix CI for test_gpt2_cache_hit ( #30902 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
2025-12-17 20:22:23 -08:00
Nicolò Lucchesi
bc3700e0cd
[NIXL] Support P tensor-parallel-size > D tensor-parallel-size ( #27274 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-18 11:53:30 +08:00
Micah Williamson
fd8afdf38d
[ROCm][CI] Reduce Flakiness For test_async_scheduling Using ROCM_ATTN With FP32 ( #30811 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-12-18 10:27:37 +08:00
SungMinCho
a0b782f9cc
[Metrics] Model FLOPs Utilization estimation ( #30738 )
...
Signed-off-by: SungMinCho <tjdals4565@gmail.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
2025-12-18 01:40:51 +00:00
Isotr0py
74a1ac38b0
[v1] Add PrefixLM support to TritonAttention backend ( #30386 )
2025-12-17 16:05:24 -08:00
Matthew Bonanni
7eb6cb6c18
[Attention] Update tests to remove deprecated env vars ( #30563 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-12-17 09:49:59 -08:00
Nicolò Lucchesi
9ca8cb38fd
[CI][Bugfix] Fix flaky tests/entrypoints/openai/test_audio.py::test_chat_streaming_audio ( #30878 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-17 18:49:56 +01:00
Jialin Ouyang
6e9dbcc50e
[Fix] uniform decode batch check ( #30747 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-12-17 19:58:43 +08:00
Chauncey
9ad5b21710
[Refactor] [4/N] Move VLLM_SERVER_DEV endpoints into the serve directory ( #30749 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-17 02:27:30 -08:00
Michael Goin
519ef9a911
[UX] Make vllm bench serve discover model by default and use --input-len ( #30816 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-17 01:55:30 -08:00
Ye (Charlotte) Qi
a100152288
[Kernels][FI] Skip trtllm attention when num_kv_heads=1 ( #30842 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-12-17 01:54:21 -08:00
Xinyu Chen
3b1d440ede
CustomOp: grouped topk ( #29575 )
...
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
2025-12-17 17:43:00 +08:00
Robin
20fda43151
[Bugfix][Frontend] Prevent IndexError in MiniMax M2 tool parser during streaming extraction ( #30555 )
...
Signed-off-by: WangErXiao <863579016@qq.com>
2025-12-17 16:37:57 +08:00
Cyrus Leung
44d3b1df3d
[CI/Build] Fix compatibility between #30244 and #30396 ( #30787 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-16 20:21:19 -08:00
Wentao Ye
b6ec077e05
[CI] Skip ci failure test ( #30804 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-16 22:47:53 +00:00
Roger Wang
f5f51e5931
[Core][MM] Optimize encoder cache manager by operating with embeddings only ( #30475 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Sun Kim <sunytokki@gmail.com>
2025-12-16 14:18:17 -08:00
Lucas Wilkinson
9fec0e13d5
[Attention] Cache attention metadata builds across hybrid KV-cache groups ( #29627 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Stanislaw Wozniak <stw@zurich.ibm.com>
2025-12-16 17:10:16 -05:00
Wentao Ye
f21f5ea38c
[Refactor] Small refactor for group topk ( #30562 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-12-16 14:50:59 -05:00
Nicolò Lucchesi
ca702a14dc
[Frontend] Add max-completion-token option to transcription/translation endpoints ( #30769 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-16 19:36:49 +00:00
Michael Goin
10ee1c64cf
[CI] Generalize gsm8k test args and add Qwen3-Next MTP B200 test ( #30723 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-16 14:28:34 -05:00
Harry Mellor
af506fd76a
Fix instantiation of HfHubHTTPError in LoRA test ( #30768 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-16 08:02:24 -08:00
Isotr0py
4de08ad698
[CI/Build] Skip broken ViT backend functionality test tempoarily ( #30782 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-16 06:45:25 -08:00
Jee Jee Li
0e391e7570
[Bugfix] Fix RequestOutput miss lora_request ( #30636 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-16 01:36:35 -08:00
Andrew Xia
0d0c929f23
[responsesAPI][8] input/output messages for ResponsesParser ( #30158 )
...
Signed-off-by: Andrew Xia <axia@fb.com>
Signed-off-by: Andrew Xia <axia@meta.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-12-16 13:54:59 +08:00
jiangkuaixue123
b9ff4f2a8d
[feature] extend DBO to XBO ( #30120 )
...
Signed-off-by: jiangkuaixue123 <jiangxiaozhou111@163.com>
Co-authored-by: root <root@hk01dgx028.cm.cluster>
2025-12-16 00:04:01 -05:00
Boyuan Feng
c881db364e
improve lazy import test ( #30733 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-12-16 03:12:05 +00:00
Shanshan Shen
3bd9c49158
[CustomOp] Extract ApplyRotaryEmb as CustomOp and unify the dispatch logic ( #29873 )
...
Signed-off-by: shen-shanshan <467638484@qq.com>
Co-authored-by: gcanlin <canlinguosdu@gmail.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2025-12-15 19:08:16 -08:00
penfree
bbd850e597
[Bugfix] fix streaming final output for non harmony ( #30237 )
...
Signed-off-by: penfree <qiupengfei@baidu.com>
Co-authored-by: penfree <qiupengfei@baidu.com>
2025-12-16 09:03:11 +08:00
Michael Goin
a450c64a30
[Bugfix] Fail instead of ignoring when CompilationConfig gets invalid args ( #30708 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-15 20:18:02 +00:00
Cyrus Leung
ed586e7724
[Refactor] [3/N] Move tool parser tests and run on CPU ( #30693 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-15 13:45:36 +00:00
Chauncey
2a1776b7ac
[Refactor] [2/N] Move tool parsers into the vLLM main directory ( #30675 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-15 12:54:52 +00:00
wang.yuqi
4429d934de
[Model] Automatic conversion of TokenClassification model ( #30666 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-15 08:13:00 +00:00
汪志鹏
1adeb3b84c
[New Model] BAGEL support (AR only) ( #28439 )
...
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-15 14:58:23 +08:00
Wentao Ye
3778673ea8
[Feat] Refactor for parallel_config in FusedMoEModularKernel ( #30282 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-12-15 04:21:36 +00:00
Shanshan Shen
87b4d1557d
[CustomOp][MM] Extract MMEncoderAttention as CustomOp and replace the backend of QwenVisionAttention with it. ( #30125 )
...
Signed-off-by: shen-shanshan <467638484@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-12-15 11:13:32 +08:00
Wenqi Glantz
84e23d103d
additional protection for CVE-2025-62164 ( #30649 )
...
Signed-off-by: Wenqi Glantz <wglantz@nvidia.com>
2025-12-15 03:07:10 +00:00
Or Ozeri
174e39ead7
CPU KV Offloading: Use more CUDA streams ( #29013 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-12-14 23:50:45 +00:00
Chendi.Xue
ae2e503dda
[NIXL][BUG FIX] Fix a bug for PD with host_buffer after merging 29665 ( #30420 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
2025-12-14 15:38:28 +00:00
ElizaWszola
994acec0cc
[Bugfix] Fix fusion for VL models ( #30244 )
...
Signed-off-by: ElizaWszola <ewszola@redhat.com>
2025-12-14 21:22:37 +08:00
Johannes F
060893654d
fix: Update json features supported by xGrammar ( #30390 )
...
Signed-off-by: Johannes Flommersfeld <johannes.flommersfeld@tngtech.com>
Signed-off-by: Johannes F <johannesflommersfeld@users.noreply.github.com>
Co-authored-by: Johannes Flommersfeld <johannes.flommersfeld@tngtech.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-14 02:16:06 -08:00
Matthias Gehre
e9add129ad
[Bugfix] awq_gemm: fix argument order swap ( #30364 )
...
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-12-14 18:15:37 +08:00
Lasha Koroshinadze
3a20450d31
Add AudioFlamingo3 model support ( #30539 )
...
Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
Signed-off-by: Lasha Koroshinadze <26011196+lashahub@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-14 02:14:55 -08:00
Cyrus Leung
dcb31196da
[Chore] Remove redundant RequestPrompt ( #30612 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-14 09:22:37 +00:00
Laith Sakka
f569c654e1
enable unbacked with aot_compile ( #30462 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com>
2025-12-14 08:14:06 +00:00