majiayu000
abd1dbc548
[Bugfix] Preserve original tokenizer class name in CachedTokenizer
...
HuggingFace transformers processor validates tokenizer type by checking
the class name. When vLLM creates a CachedTokenizer with a modified
class name (e.g., 'CachedQwen2TokenizerFast'), the processor type check
fails with TypeError.
This fix preserves the original tokenizer class name and qualname in
CachedTokenizer, ensuring compatibility with HuggingFace transformers
processor type checking.
Fixes #31080
Signed-off-by: Claude <noreply@anthropic.com>
Signed-off-by: majiayu000 <1835304752@qq.com>
2025-12-24 16:02:48 +08:00
Andreas Karatzas
0247a91e00
[ROCm][CI] Fix entrypoints tests and Python-only installation test on ROCm ( #28979 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-23 22:42:30 -08:00
Michael Goin
8ee90c83f8
Add --max-model-len auto to auto-fit context to available memory ( #29431 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-23 21:37:14 -08:00
Micah Williamson
3ce791ac77
[ROCm][CI] Set VLLM_FLOAT32_MATMUL_PRECISION="tf32" For terratorch Tests In AMD CI ( #31242 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-12-24 03:21:50 +00:00
Andreas Karatzas
e42894f5b5
[ROCm][CI][Bugfix] Fix Siglip2 rotary embedding dispatch and InternVL video test tolerance ( #31235 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-24 02:56:58 +00:00
Chen Zhang
538e830caa
[KVEvent] User request.block_hash for parent block_hash ( #30544 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Co-authored-by: Yifan Qiao <yifanqiao@berkeley.edu>
2025-12-23 18:23:43 -08:00
rongfu.leng
4ed11105d7
[Misc] Remove unused custom ops copy_blocks and copy_blocks_mla ( #30967 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2025-12-23 18:22:35 -08:00
Vadim Gimpelson
bc0a5a0c08
[CI] Add Qwen3-Next-FP8 to Blackwell model tests ( #31049 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
2025-12-23 17:21:50 -08:00
Andreas Karatzas
bfa2c0bbb9
[ROCm][Bugfix] Fix RuntimeError in MMEncoderAttention by replacing .view() with .reshape() ( #31203 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-23 21:48:01 +00:00
Mark McLoughlin
f790068600
[Core] Add a random suffix to frontend-provided request IDs ( #27987 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-12-23 13:05:39 -08:00
Joachim Studnia
38c361f99d
Fix edge case Mistral tool parser ( #30724 )
...
Signed-off-by: Joachim Studnia <joachim@mistral.ai>
Signed-off-by: Joachim Studnia <studniajoachim@gmail.com>
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: juliendenize <julien.denize@mistral.ai>
Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2025-12-23 14:19:58 +00:00
Cyrus Leung
bb62dda2c3
[Misc] Introduce encode_*_url utility function ( #31208 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-23 13:45:21 +00:00
Patrick von Platen
3faa8bee57
adapt voxtral ( #31095 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
2025-12-23 05:31:55 -08:00
Jakub Zakrzewski
23daef548d
[Frontend] Support using chat template as custom score template for reranking models ( #30550 )
...
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-12-23 11:19:16 +00:00
vllmellm
f32cfd7d97
[ROCm][FEAT] Support AITER RMSNorm quantization fusion pass ( #26575 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2025-12-23 02:07:54 -08:00
Cyrus Leung
8cef137689
[Chore] Update more locations to use attention_config.backend ( #31153 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-22 19:19:50 -08:00
Pavani Majety
3e10262356
Revert "[SM100] Enable fp8 compute for prefill MLA ( #30746 )" ( #31197 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
2025-12-22 18:15:33 -08:00
Angela Yi
612d5ffdab
[ci] Fix Pytorch compilation test oom in 2.10 ( #31194 )
...
Signed-off-by: angelayi <yiangela7@gmail.com>
2025-12-23 01:56:47 +00:00
Divakar Verma
78e5e62bbf
[AMD][CI] fix v1/engine test_preprocess_error_handling ( #31192 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2025-12-23 01:28:19 +00:00
Michael Goin
6d518ffbaa
[CI Failure] Disable mosaicml/mpt-7b and databricks/dbrx-instruct tests ( #31182 )
...
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-12-22 15:40:35 -08:00
Lucas Wilkinson
de71747655
[SpecDecode] Simplified alternative padded-speculation acceptance rate fix ( #29845 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-12-22 13:06:10 -08:00
Pavani Majety
b10f41c894
[SM100] Enable fp8 compute for prefill MLA ( #30746 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
2025-12-22 19:15:57 +00:00
Yongye Zhu
7b926e8901
[MoE Refactor][9/N] Use modular kernel for unquantized Triton MoE ( #31052 )
...
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
2025-12-22 17:34:19 +00:00
Nicolò Lucchesi
b1c3f96ae3
[CI][Bugfix] Fix entrypoints/openai/test_audio.py ( #31151 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-22 07:21:40 -08:00
Shengqi Chen
2cf91c2ea4
[CI] add polling for precompiled wheel in python_only_compile.sh, fix index generation for releases ( #30781 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
2025-12-22 13:24:21 +00:00
Kevin McKay
42b42824ae
[Misc] Fix grammar errors in comments and messages ( #31115 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
2025-12-21 21:14:02 -08:00
Kevin McKay
ec58c10ce1
[Misc] Fix quantization-related typos ( #31116 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
2025-12-21 21:13:48 -08:00
Kevin McKay
8c084de59d
[Misc] Fix spelling typos in comments ( #31114 )
...
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
2025-12-21 21:13:14 -08:00
CedricHuang
19cc9468fd
[Feature]: Support NVIDIA ModelOpt HF FP8 variants FP8_PER_CHANNEL_PER_TOKEN and FP8_PB_WO in vLLM ( #30957 )
2025-12-21 22:34:49 -05:00
Jee Jee Li
097978a15d
[Kernel] Enable fused_qknorm_rope_kernel supports partial rope ( #30821 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-21 18:39:22 -08:00
Robert Shaw
b471092d3a
[MoE Refactor][4/N] Marlin Fp8 Mk ( #31036 )
2025-12-21 12:37:42 -05:00
汪志鹏
3e92b2b7ac
[BugFix]fix gpt-oss v1/completions response bug ( #30608 )
...
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: bbrowning <bbrownin@redhat.com>
2025-12-21 10:39:31 +08:00
baonudesifeizhai
54c8924384
[MoE Refactor][5/N] Isolate zero expert to LongCatFlash ( #28891 )
...
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Signed-off-by: Dongjie Zou <85092850+baonudesifeizhai@users.noreply.github.com>
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robertgshaw2@gmail.com>
2025-12-20 18:22:04 +00:00
Lucas Wilkinson
ff2168bca3
[CI] FIx fixture 'siglip_attention_config' not found ( #31053 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-12-20 03:46:15 +00:00
zejunchen-zejun
d52c5096d7
[Bugfix] fix the alias bug of AttentionBackendEnum when register CUSTOM attention backend to vllm ( #30869 )
...
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
2025-12-20 09:03:35 +08:00
Robert Shaw
83a317f650
[MoE Refactor][3/N] Deprecate cutlass block quant fp8 (b200) ( #30990 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2025-12-19 13:09:54 -08:00
Seiji Eicher
1ab5213531
Make engine core client handshake timeout configurable ( #27444 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
2025-12-19 20:38:30 +00:00
Zhonghua Deng
969bbc7c61
[Model] Add MiMo-V2-Flash support ( #30836 )
...
Signed-off-by: Abatom <abzhonghua@gmail.com>
Signed-off-by: Jumiar <liuanqim10@126.com>
Signed-off-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jumiar <liuanqim10@126.com>
Co-authored-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-19 17:17:03 +00:00
Marko Rosenmueller
455949675d
[Frontend][Bug] allow tool calls in analysis channel ( #28139 )
...
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-12-19 10:47:44 +00:00
lif
086b96339f
[Bugfix] Add validation for tool requests when tool_parser is unavailable ( #30613 )
...
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2025-12-19 18:23:28 +08:00
Wenqi Glantz
4924ac582c
Add hidden dimension validation for multimodal embedding inputs ( #30968 )
...
Signed-off-by: Wenqi Glantz <wglantz@nvidia.com>
2025-12-19 07:59:36 +00:00
Nick Hill
2ac85a4544
[BugFix] Fix logprobs with spec decode and modified logits ( #30846 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-12-18 19:58:28 -08:00
Andreas Karatzas
7b43db210c
[ROCm][CI][Bugfix] Multi-Modal Model Support Fixes and Attention Backend Improvements ( #30270 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-19 02:17:27 +00:00
PlatinumGod
6a09612b2e
[Bugfix] Fix tool_choice="none" being ignored by GPT-OSS/harmony models ( #30867 )
...
Signed-off-by: yujiepu <pyjapple@gmail.com>
Signed-off-by: PlatinumGod <pyjapple@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2025-12-19 09:34:27 +08:00
Nick Hill
45c0526ac9
[BugFix] Handle errors when preprocessing added requests ( #30895 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-12-19 01:29:11 +00:00
Elizabeth Thomas
41b6f9200f
Remove all2all backend envvar ( #30363 )
...
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-18 19:46:28 +00:00
Isotr0py
700a5ad6c6
[MM Encoder]: Migrate legacy ViT MultiHeadAttention to new MMEncoderAttention interface ( #30684 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-19 02:04:19 +08:00
inkcherry
500f26e6d3
[Bugfix] fix DP-aware routing in OpenAI API requests ( #29002 )
...
Signed-off-by: inkcherry <mingzhi.liu@amd.com>
2025-12-18 09:50:42 -08:00
sarathc-cerebras
28d15ab56b
adds jais 2 support ( #30188 )
...
Signed-off-by: sarathc-cerebras <sarath.chandran@cerebras.net>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-12-18 15:46:58 +00:00
Lucas Wilkinson
30bb19a760
[BugFix] Partial revert of #29558 (DeepEP HT + PIECEWISE CG support) ( #30910 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-17 23:50:15 -08:00