Cyrus Leung
aa3868ecfe
[Chore] Remove unused noqas ( #31263 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-24 05:38:46 -08:00
Andreas Karatzas
0247a91e00
[ROCm][CI] Fix entrypoints tests and Python-only installation test on ROCm ( #28979 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-23 22:42:30 -08:00
Michael Goin
8ee90c83f8
Add --max-model-len auto to auto-fit context to available memory ( #29431 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-23 21:37:14 -08:00
Micah Williamson
3ce791ac77
[ROCm][CI] Set VLLM_FLOAT32_MATMUL_PRECISION="tf32" For terratorch Tests In AMD CI ( #31242 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-12-24 03:21:50 +00:00
Andreas Karatzas
e42894f5b5
[ROCm][CI][Bugfix] Fix Siglip2 rotary embedding dispatch and InternVL video test tolerance ( #31235 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-24 02:56:58 +00:00
Chen Zhang
538e830caa
[KVEvent] User request.block_hash for parent block_hash ( #30544 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Co-authored-by: Yifan Qiao <yifanqiao@berkeley.edu>
2025-12-23 18:23:43 -08:00
rongfu.leng
4ed11105d7
[Misc] Remove unused custom ops copy_blocks and copy_blocks_mla ( #30967 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2025-12-23 18:22:35 -08:00
Vadim Gimpelson
bc0a5a0c08
[CI] Add Qwen3-Next-FP8 to Blackwell model tests ( #31049 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
2025-12-23 17:21:50 -08:00
Andreas Karatzas
bfa2c0bbb9
[ROCm][Bugfix] Fix RuntimeError in MMEncoderAttention by replacing .view() with .reshape() ( #31203 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-23 21:48:01 +00:00
Mark McLoughlin
f790068600
[Core] Add a random suffix to frontend-provided request IDs ( #27987 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-12-23 13:05:39 -08:00
Joachim Studnia
38c361f99d
Fix edge case Mistral tool parser ( #30724 )
...
Signed-off-by: Joachim Studnia <joachim@mistral.ai>
Signed-off-by: Joachim Studnia <studniajoachim@gmail.com>
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: juliendenize <julien.denize@mistral.ai>
Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2025-12-23 14:19:58 +00:00
Cyrus Leung
bb62dda2c3
[Misc] Introduce encode_*_url utility function ( #31208 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-23 13:45:21 +00:00
Patrick von Platen
3faa8bee57
adapt voxtral ( #31095 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
2025-12-23 05:31:55 -08:00
Jakub Zakrzewski
23daef548d
[Frontend] Support using chat template as custom score template for reranking models ( #30550 )
...
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-23 11:19:16 +00:00
vllmellm
f32cfd7d97
[ROCm][FEAT] Support AITER RMSNorm quantization fusion pass ( #26575 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2025-12-23 02:07:54 -08:00
Cyrus Leung
8cef137689
[Chore] Update more locations to use attention_config.backend ( #31153 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-22 19:19:50 -08:00
Pavani Majety
3e10262356
Revert "[SM100] Enable fp8 compute for prefill MLA ( #30746 )" ( #31197 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
2025-12-22 18:15:33 -08:00
Angela Yi
612d5ffdab
[ci] Fix Pytorch compilation test oom in 2.10 ( #31194 )
...
Signed-off-by: angelayi <yiangela7@gmail.com>
2025-12-23 01:56:47 +00:00
Divakar Verma
78e5e62bbf
[AMD][CI] fix v1/engine test_preprocess_error_handling ( #31192 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2025-12-23 01:28:19 +00:00
Michael Goin
6d518ffbaa
[CI Failure] Disable mosaicml/mpt-7b and databricks/dbrx-instruct tests ( #31182 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-22 15:40:35 -08:00
Lucas Wilkinson
de71747655
[SpecDecode] Simplified alternative padded-speculation acceptance rate fix ( #29845 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-12-22 13:06:10 -08:00
Pavani Majety
b10f41c894
[SM100] Enable fp8 compute for prefill MLA ( #30746 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
2025-12-22 19:15:57 +00:00
Yongye Zhu
7b926e8901
[MoE Refactor][9/N] Use modular kernel for unquantized Triton MoE ( #31052 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
2025-12-22 17:34:19 +00:00
Nicolò Lucchesi
b1c3f96ae3
[CI][Bugfix] Fix entrypoints/openai/test_audio.py ( #31151 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-22 07:21:40 -08:00
Shengqi Chen
2cf91c2ea4
[CI] add polling for precompiled wheel in python_only_compile.sh, fix index generation for releases ( #30781 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
2025-12-22 13:24:21 +00:00
Kevin McKay
42b42824ae
[Misc] Fix grammar errors in comments and messages ( #31115 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
2025-12-21 21:14:02 -08:00
Kevin McKay
ec58c10ce1
[Misc] Fix quantization-related typos ( #31116 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
2025-12-21 21:13:48 -08:00
Kevin McKay
8c084de59d
[Misc] Fix spelling typos in comments ( #31114 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
2025-12-21 21:13:14 -08:00
CedricHuang
19cc9468fd
[Feature]: Support NVIDIA ModelOpt HF FP8 variants FP8_PER_CHANNEL_PER_TOKEN and FP8_PB_WO in vLLM ( #30957 )
2025-12-21 22:34:49 -05:00
Jee Jee Li
097978a15d
[Kernel] Enable fused_qknorm_rope_kernel supports partial rope ( #30821 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-21 18:39:22 -08:00
Robert Shaw
b471092d3a
[MoE Refactor][4/N] Marlin Fp8 Mk ( #31036 )
2025-12-21 12:37:42 -05:00
汪志鹏
3e92b2b7ac
[BugFix]fix gpt-oss v1/completions response bug ( #30608 )
...
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: bbrowning <bbrownin@redhat.com>
2025-12-21 10:39:31 +08:00
baonudesifeizhai
54c8924384
[MoE Refactor][5/N] Isolate zero expert to LongCatFlash ( #28891 )
...
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Signed-off-by: Dongjie Zou <85092850+baonudesifeizhai@users.noreply.github.com>
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robertgshaw2@gmail.com>
2025-12-20 18:22:04 +00:00
Lucas Wilkinson
ff2168bca3
[CI] FIx fixture 'siglip_attention_config' not found ( #31053 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-12-20 03:46:15 +00:00
zejunchen-zejun
d52c5096d7
[Bugfix] fix the alias bug of AttentionBackendEnum when register CUSTOM attention backend to vllm ( #30869 )
...
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
2025-12-20 09:03:35 +08:00
Robert Shaw
83a317f650
[MoE Refactor][3/N] Deprecate cutlass block quant fp8 (b200) ( #30990 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2025-12-19 13:09:54 -08:00
Seiji Eicher
1ab5213531
Make engine core client handshake timeout configurable ( #27444 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
2025-12-19 20:38:30 +00:00
Zhonghua Deng
969bbc7c61
[Model] Add MiMo-V2-Flash support ( #30836 )
...
Signed-off-by: Abatom <abzhonghua@gmail.com>
Signed-off-by: Jumiar <liuanqim10@126.com>
Signed-off-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jumiar <liuanqim10@126.com>
Co-authored-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-19 17:17:03 +00:00
Marko Rosenmueller
455949675d
[Frontend][Bug] allow tool calls in analysis channel ( #28139 )
...
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-12-19 10:47:44 +00:00
lif
086b96339f
[Bugfix] Add validation for tool requests when tool_parser is unavailable ( #30613 )
...
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 18:23:28 +08:00
Wenqi Glantz
4924ac582c
Add hidden dimension validation for multimodal embedding inputs ( #30968 )
...
Signed-off-by: Wenqi Glantz <wglantz@nvidia.com>
2025-12-19 07:59:36 +00:00
Nick Hill
2ac85a4544
[BugFix] Fix logprobs with spec decode and modified logits ( #30846 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-12-18 19:58:28 -08:00
Andreas Karatzas
7b43db210c
[ROCm][CI][Bugfix] Multi-Modal Model Support Fixes and Attention Backend Improvements ( #30270 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-19 02:17:27 +00:00
PlatinumGod
6a09612b2e
[Bugfix] Fix tool_choice="none" being ignored by GPT-OSS/harmony models ( #30867 )
...
Signed-off-by: yujiepu <pyjapple@gmail.com>
Signed-off-by: PlatinumGod <pyjapple@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-12-19 09:34:27 +08:00
Nick Hill
45c0526ac9
[BugFix] Handle errors when preprocessing added requests ( #30895 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-12-19 01:29:11 +00:00
Elizabeth Thomas
41b6f9200f
Remove all2all backend envvar ( #30363 )
...
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-18 19:46:28 +00:00
Isotr0py
700a5ad6c6
[MM Encoder]: Migrate legacy ViT MultiHeadAttention to new MMEncoderAttention interface ( #30684 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-19 02:04:19 +08:00
inkcherry
500f26e6d3
[Bugfix] fix DP-aware routing in OpenAI API requests ( #29002 )
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-12-18 09:50:42 -08:00
sarathc-cerebras
28d15ab56b
adds jais 2 support ( #30188 )
...
Signed-off-by: sarathc-cerebras <sarath.chandran@cerebras.net>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-12-18 15:46:58 +00:00
Lucas Wilkinson
30bb19a760
[BugFix] Partial revert of #29558 (DeepEP HT + PIECEWISE CG support) ( #30910 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-17 23:50:15 -08:00