Alec S
65ee97288a
[BugFix] Adding env variable to disable async grammar compilation ( #29996 )
...
Signed-off-by: Alec Solder <alecs@fb.com>
Signed-off-by: Alec S <10566873+alecsolder@users.noreply.github.com>
Co-authored-by: Alec Solder <alecs@fb.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-12-05 00:49:37 -08:00
Yanan Cao
62b3333448
[Frontend] Remove deprecated -O.xx flag ( #29991 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
2025-12-05 00:47:22 -08:00
amitz-nv
6038b1b04b
[Frontend][Model] Add 'float16' to possible mamba cache dtype values, override mamba SSM cache dtype value for NemotronH ( #29978 )
...
Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
2025-12-05 00:34:33 -08:00
Jingchun Gao
d698bb382d
[Bugfix] Correct num_q_heads on DCP for Flashinfer backends ( #29487 )
...
Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com>
Signed-off-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>
Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com>
2025-12-05 05:54:31 +00:00
Laith Sakka
5867819eaf
Do not guard during noop elimination pass ( #30095 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com>
2025-12-05 04:10:12 +00:00
Qiu
0098a6e3da
[PCP&DCP] move CUDAGraph check for PCP&DCP to the check func of platforms ( #29952 )
...
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-04 21:40:51 -05:00
Hubert de La Jonquiere
befb59e5b1
[Model] Add Holo2 reasoning parser ( #30048 )
...
Signed-off-by: hdlj-h <hubert@hcompany.ai>
2025-12-05 10:38:45 +08:00
Alexander Matveev
4470ee2f90
[Perf] Enable separate shared_experts stream only for CUDA ( #30085 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-12-05 00:03:17 +00:00
Laith Sakka
1f0d184590
[aot_compile]change VLLM backend to read fake args from example_value ( #29104 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com>
2025-12-04 17:33:45 -05:00
Lucas Wilkinson
c8ab988b15
[BugFix] Fix DBO assert assert B_block_table == B_q ( #29933 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-12-04 14:48:54 -05:00
Peng-YM
48a5fff66e
[Bugfix] Missing tokens in return_token_ids when tool parsers is enabled in streaming mode ( #29074 )
...
Signed-off-by: Peng-YM <1048217874pengym@gmail.com>
2025-12-04 19:09:39 +00:00
Mercykid-bash
1119f6e47a
Abstract eplb algo ( #26471 )
...
Signed-off-by: Che Ruan <cr623@ic.ac.uk>
Signed-off-by: mengxingkongzhouhan <117415539+mengxingkongzhouhan@users.noreply.github.com>
Signed-off-by: Mercykid-bash <ruanche0218@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Che Ruan <cr623@ic.ac.uk>
Co-authored-by: mengxingkongzhouhan <117415539+mengxingkongzhouhan@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-04 19:09:09 +00:00
Harry Mellor
e10c84e06a
Access partial_rotary_factor from rope_parameters ( #29966 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-04 18:42:49 +00:00
Kuntai Du
ece2825a29
[KVConnector] Remove v0-related kv connector components such as kv pipe and kv lookup buffer ( #29705 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
2025-12-04 18:20:48 +00:00
Jee Jee Li
652ba93da3
[Bugfix] Fix FP8 MoE LoRA ( #29890 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-04 18:17:49 +00:00
Tao Yun
6dcb07f676
support qwen3-vl handle requests with embeddings ( #30037 )
...
Signed-off-by: taoyun <1069423820@qq.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-12-04 17:34:06 +00:00
Cyrus Leung
b286a311c2
[Chore] Deprecate merge_by_field_config arg ( #30035 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-04 17:21:24 +00:00
Woosuk Kwon
cc050558f4
[Model Runner V2] Implement get_num_sampled_and_rejected kernel ( #30029 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-12-04 07:19:42 -08:00
Harry Mellor
5c32a06a04
Use Transformers v5 RoPE standardisation and validation ( #30046 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-04 14:54:28 +00:00
Yongtao Huang
dd97e047e0
Fix broken multiline assert in LoRAModelManager.register_module ( #30032 )
...
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com>
2025-12-04 22:04:42 +08:00
Harry Mellor
9998ea5b57
Delete HF version of Phi 4 MM ( #30049 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-04 13:44:50 +00:00
wang.yuqi
74c4d80c6c
[Model][6/N] Improve all pooling task | Support chunked prefill with ALL pooling ( #27145 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-04 13:44:15 +00:00
Chauncey
6796ce8bdb
[Bugfix] Fix the issue with interleaved thinking when using streaming ( #30033 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-12-04 11:11:59 +00:00
Andreas Karatzas
e96a6a6dca
[ROCm][CI][Bugfix] Fixing the Multi-Modal Models Test (Extended) 1 group ( #30013 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-04 11:00:16 +00:00
Noa Neria
6366c098d7
Validating Runai Model Streamer Integration with S3 Object Storage ( #29320 )
...
Signed-off-by: Noa Neria <noa@run.ai>
2025-12-04 18:04:43 +08:00
dtc
842aba501d
[P/D] Introduce Mooncake Transfer Engine as kv_connector ( #24718 )
...
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: dtc <dtcccc@linux.alibaba.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
2025-12-04 09:51:36 +00:00
Arpit Khandelwal
dfdda96747
[Core] Remove forced None assignment for deprecated PassConfig flags ( #29994 )
...
Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-04 09:15:04 +00:00
Xu Wenqing
ffdd18111b
Add DeepSeek-V3.2 tool parser. ( #29848 )
...
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
2025-12-04 08:46:34 +00:00
Ye (Charlotte) Qi
b8a6ae4158
[ROCm] add fallback for aiter fp8 decode mla ( #30005 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-12-04 08:45:57 +00:00
Cyrus Leung
68eb5c8d97
[Misc] Move functions into PoolingMetadata ( #30027 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-04 08:21:19 +00:00
TJian
3f1b03739a
[ROCm] [Bugfix] compute_attn_mask_seqlen for qwen3 omni ( #29974 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-12-04 08:20:24 +00:00
daniel-salib
404fc4bfc0
[Frontend] refactor harmony utils output message parsing ( #29820 )
...
Signed-off-by: Daniel Salib <danielsalib@meta.com>
2025-12-04 15:36:57 +08:00
Chauncey
82a64b3d8f
[Bugfix] fixed deepseekv32 tool calling error ( #30025 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-04 15:12:12 +08:00
Cyrus Leung
9ae2f60374
[Misc] Various cleanups for MM input processing ( #29970 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-04 06:22:20 +00:00
Kuntai Du
8aaa81b35f
[KVConnector] remove unused code (the model aware kv ops class) ( #29709 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
2025-12-04 06:00:52 +00:00
Benjamin Bartels
fca3f46658
[Frontend] Fixes anthropic /v1/messages streaming not containing input_tokens on first chunk ( #29971 )
...
Signed-off-by: bbartels <benjamin@bartels.dev>
2025-12-04 05:50:27 +00:00
gausah01
28097d5638
[Bugfix][CPU] Fix CPU KV cache fallback memory allocation ( #29604 )
...
Signed-off-by: Gauri Sahnan <gauri.sahnan@arm.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2025-12-04 13:01:15 +08:00
Jee Jee Li
dd38ba3a26
[Bugfix] Fix adapter_enabled IMA ( #29977 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-04 12:51:15 +08:00
Xieyang Xu
ad32e3e19c
enable multi-node in external launcher mode ( #29833 )
2025-12-03 17:02:02 -08:00
Shengqi Chen
1109f98288
[CI] fix docker image build by specifying merge-base commit id when downloading pre-compiled wheels ( #29930 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
2025-12-03 14:08:19 -08:00
Elizabeth Thomas
b5407869c8
[Bugfix] Respect VLLM_CONFIGURE_LOGGING value ( #28671 )
...
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Jane Xu <janeyx@meta.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Johnny Yang <johnnyyang@google.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: bruceszchen <bruceszchen@tencent.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Johnny Yang <24908445+jcyang43@users.noreply.github.com>
2025-12-03 22:00:52 +00:00
bnellnm
2902c34826
[Kernels] Remove BatchedTritonOrDeepGemmExperts and default fallback to Triton ( #29929 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: bnellnm <49004751+bnellnm@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-12-03 20:49:00 +00:00
Wentao Ye
ac1886588f
[CI] Fix re import error ( #29973 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-03 15:16:54 -05:00
Yongtao Huang
2fc5d6e0d7
Fix LLMEngine.del dp_group cleanup condition ( #29954 )
...
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com>
2025-12-03 12:14:44 -08:00
elvischenv
afe9eb408e
[Bugfix] Fix flashinfer ar+norm kernel not available issue ( #29960 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2025-12-03 18:50:53 +00:00
Varun Sundar Rabindranath
19bee6d12d
[Performance][DP/EP] Add silu_mul_per_token_group_quant_fp8_colmajor kernel ( #29470 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-12-03 18:04:59 +00:00
avigny
dd5d1ef780
[Bugfix] Mistral tool parser streaming update ( #19425 )
...
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Jeff Cook <jeff@jeffcook.io>
Co-authored-by: sfbemerk <benjaminmerkel@mail.de>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-03 17:45:31 +00:00
Yu Jiaqi
9ae3c55b10
SigLIP example add chat_template ( #29902 )
...
Signed-off-by: piood <2477084691@qq.com>
2025-12-03 16:12:58 +00:00
Lumis Chen
9bcf92295a
[Core] Add xxHash as a high-performance hash option for accelerating prefix caching ( #29163 )
...
Signed-off-by: LuminolT <lumischen01@gmail.com>
Signed-off-by: Lumis Chen <lumischen01@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-12-03 16:06:57 +00:00
Chauncey
b78772c433
[Frontend] supports deepseekv32 chat template ( #29837 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2025-12-03 20:53:44 +08:00