3933 Commits

Author SHA1 Message Date
汪志鹏
3e92b2b7ac
[BugFix]fix gpt-oss v1/completions response bug (#30608)
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: bbrowning <bbrownin@redhat.com>
2025-12-21 10:39:31 +08:00
baonudesifeizhai
54c8924384
[MoE Refactor][5/N] Isolate zero expert to LongCatFlash (#28891)
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Signed-off-by: Dongjie Zou <85092850+baonudesifeizhai@users.noreply.github.com>
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robertgshaw2@gmail.com>
2025-12-20 18:22:04 +00:00
Lucas Wilkinson
ff2168bca3
[CI] FIx fixture 'siglip_attention_config' not found (#31053)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-12-20 03:46:15 +00:00
zejunchen-zejun
d52c5096d7
[Bugfix] fix the alias bug of AttentionBackendEnum when register CUSTOM attention backend to vllm (#30869)
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
2025-12-20 09:03:35 +08:00
Robert Shaw
83a317f650
[MoE Refactor][3/N] Deprecate cutlass block quant fp8 (b200) (#30990)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2025-12-19 13:09:54 -08:00
Seiji Eicher
1ab5213531
Make engine core client handshake timeout configurable (#27444)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
2025-12-19 20:38:30 +00:00
Zhonghua Deng
969bbc7c61
[Model] Add MiMo-V2-Flash support (#30836)
Signed-off-by: Abatom <abzhonghua@gmail.com>
Signed-off-by: Jumiar <liuanqim10@126.com>
Signed-off-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jumiar <liuanqim10@126.com>
Co-authored-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-19 17:17:03 +00:00
Marko Rosenmueller
455949675d
[Frontend][Bug] allow tool calls in analysis channel (#28139)
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-12-19 10:47:44 +00:00
lif
086b96339f
[Bugfix] Add validation for tool requests when tool_parser is unavailable (#30613)
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 18:23:28 +08:00
Wenqi Glantz
4924ac582c
Add hidden dimension validation for multimodal embedding inputs (#30968)
Signed-off-by: Wenqi Glantz <wglantz@nvidia.com>
2025-12-19 07:59:36 +00:00
Nick Hill
2ac85a4544
[BugFix] Fix logprobs with spec decode and modified logits (#30846)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-12-18 19:58:28 -08:00
Andreas Karatzas
7b43db210c
[ROCm][CI][Bugfix] Multi-Modal Model Support Fixes and Attention Backend Improvements (#30270)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-19 02:17:27 +00:00
PlatinumGod
6a09612b2e
[Bugfix] Fix tool_choice="none" being ignored by GPT-OSS/harmony models (#30867)
Signed-off-by: yujiepu <pyjapple@gmail.com>
Signed-off-by: PlatinumGod <pyjapple@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-12-19 09:34:27 +08:00
Nick Hill
45c0526ac9
[BugFix] Handle errors when preprocessing added requests (#30895)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-12-19 01:29:11 +00:00
Elizabeth Thomas
41b6f9200f
Remove all2all backend envvar (#30363)
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-18 19:46:28 +00:00
Isotr0py
700a5ad6c6
[MM Encoder]: Migrate legacy ViT MultiHeadAttention to new MMEncoderAttention interface (#30684)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-19 02:04:19 +08:00
inkcherry
500f26e6d3
[Bugfix] fix DP-aware routing in OpenAI API requests (#29002)
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-12-18 09:50:42 -08:00
sarathc-cerebras
28d15ab56b
adds jais 2 support (#30188)
Signed-off-by: sarathc-cerebras <sarath.chandran@cerebras.net>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-12-18 15:46:58 +00:00
Lucas Wilkinson
30bb19a760
[BugFix] Partial revert of #29558 (DeepEP HT + PIECEWISE CG support) (#30910)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-17 23:50:15 -08:00
Yifan Qiao
11a89cf95c
[Fix][FlexAttention] return max logical block index to handle reused blocks (#30915)
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
2025-12-18 06:42:21 +00:00
Li, Jiang
e3ab93c896
[CPU] Refactor CPU fused MOE (#30531)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-12-18 14:36:49 +08:00
Zhengxu Chen
5f2f3fba1d
[compile] Fix CI for test_gpt2_cache_hit (#30902)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
2025-12-17 20:22:23 -08:00
Nicolò Lucchesi
bc3700e0cd
[NIXL] Support P tensor-parallel-size > D tensor-parallel-size (#27274)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-18 11:53:30 +08:00
Micah Williamson
fd8afdf38d
[ROCm][CI] Reduce Flakiness For test_async_scheduling Using ROCM_ATTN With FP32 (#30811)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-12-18 10:27:37 +08:00
SungMinCho
a0b782f9cc
[Metrics] Model FLOPs Utilization estimation (#30738)
Signed-off-by: SungMinCho <tjdals4565@gmail.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
2025-12-18 01:40:51 +00:00
Isotr0py
74a1ac38b0
[v1] Add PrefixLM support to TritonAttention backend (#30386) 2025-12-17 16:05:24 -08:00
Matthew Bonanni
7eb6cb6c18
[Attention] Update tests to remove deprecated env vars (#30563)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-12-17 09:49:59 -08:00
Nicolò Lucchesi
9ca8cb38fd
[CI][Bugfix] Fix flaky tests/entrypoints/openai/test_audio.py::test_chat_streaming_audio (#30878)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-17 18:49:56 +01:00
Jialin Ouyang
6e9dbcc50e
[Fix] uniform decode batch check (#30747)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-12-17 19:58:43 +08:00
Chauncey
9ad5b21710
[Refactor] [4/N] Move VLLM_SERVER_DEV endpoints into the serve directory (#30749)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-17 02:27:30 -08:00
Michael Goin
519ef9a911
[UX] Make vllm bench serve discover model by default and use --input-len (#30816)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-17 01:55:30 -08:00
Ye (Charlotte) Qi
a100152288
[Kernels][FI] Skip trtllm attention when num_kv_heads=1 (#30842)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-12-17 01:54:21 -08:00
Xinyu Chen
3b1d440ede
CustomOp: grouped topk (#29575)
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
2025-12-17 17:43:00 +08:00
Robin
20fda43151
[Bugfix][Frontend] Prevent IndexError in MiniMax M2 tool parser during streaming extraction (#30555)
Signed-off-by: WangErXiao <863579016@qq.com>
2025-12-17 16:37:57 +08:00
Cyrus Leung
44d3b1df3d
[CI/Build] Fix compatibility between #30244 and #30396 (#30787)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-16 20:21:19 -08:00
Wentao Ye
b6ec077e05
[CI] Skip ci failure test (#30804)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-16 22:47:53 +00:00
Roger Wang
f5f51e5931
[Core][MM] Optimize encoder cache manager by operating with embeddings only (#30475)
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Sun Kim <sunytokki@gmail.com>
2025-12-16 14:18:17 -08:00
Lucas Wilkinson
9fec0e13d5
[Attention] Cache attention metadata builds across hybrid KV-cache groups (#29627)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Stanislaw Wozniak <stw@zurich.ibm.com>
2025-12-16 17:10:16 -05:00
Wentao Ye
f21f5ea38c
[Refactor] Small refactor for group topk (#30562)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-12-16 14:50:59 -05:00
Nicolò Lucchesi
ca702a14dc
[Frontend] Add max-completion-token option to transcription/translation endpoints (#30769)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-16 19:36:49 +00:00
Michael Goin
10ee1c64cf
[CI] Generalize gsm8k test args and add Qwen3-Next MTP B200 test (#30723)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-16 14:28:34 -05:00
Harry Mellor
af506fd76a
Fix instantiation of HfHubHTTPError in LoRA test (#30768)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-16 08:02:24 -08:00
Isotr0py
4de08ad698
[CI/Build] Skip broken ViT backend functionality test tempoarily (#30782)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-16 06:45:25 -08:00
Jee Jee Li
0e391e7570
[Bugfix] Fix RequestOutput miss lora_request (#30636)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-16 01:36:35 -08:00
Andrew Xia
0d0c929f23
[responsesAPI][8] input/output messages for ResponsesParser (#30158)
Signed-off-by: Andrew Xia <axia@fb.com>
Signed-off-by: Andrew Xia <axia@meta.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-12-16 13:54:59 +08:00
jiangkuaixue123
b9ff4f2a8d
[feature] extend DBO to XBO (#30120)
Signed-off-by: jiangkuaixue123 <jiangxiaozhou111@163.com>
Co-authored-by: root <root@hk01dgx028.cm.cluster>
2025-12-16 00:04:01 -05:00
Boyuan Feng
c881db364e
improve lazy import test (#30733)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2025-12-16 03:12:05 +00:00
Shanshan Shen
3bd9c49158
[CustomOp] Extract ApplyRotaryEmb as CustomOp and unify the dispatch logic (#29873)
Signed-off-by: shen-shanshan <467638484@qq.com>
Co-authored-by: gcanlin <canlinguosdu@gmail.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2025-12-15 19:08:16 -08:00
penfree
bbd850e597
[Bugfix] fix streaming final output for non harmony (#30237)
Signed-off-by: penfree <qiupengfei@baidu.com>
Co-authored-by: penfree <qiupengfei@baidu.com>
2025-12-16 09:03:11 +08:00
Michael Goin
a450c64a30
[Bugfix] Fail instead of ignoring when CompilationConfig gets invalid args (#30708)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-15 20:18:02 +00:00