Jee Jee Li
652ba93da3
[Bugfix] Fix FP8 MoE LoRA ( #29890 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-04 18:17:49 +00:00
Tao Yun
6dcb07f676
support qwen3-vl handle requests with embeddings ( #30037 )
...
Signed-off-by: taoyun <1069423820@qq.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-12-04 17:34:06 +00:00
Qiu
46cbbca05c
[CI][DCP][Perf] reduce DCP CI execution time ( #29858 )
...
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
2025-12-04 17:28:21 +00:00
Cyrus Leung
b286a311c2
[Chore] Deprecate merge_by_field_config arg ( #30035 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-04 17:21:24 +00:00
Shengqi Chen
990f806473
[Doc] clarify nightly builds in developer docs ( #30019 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
2025-12-05 00:28:37 +08:00
Doug Smith
5b4b42c0b6
Mark DBO test as flaky on b200 for Distributed B200 test ( #29913 )
...
Signed-off-by: dougbtv <dosmith@redhat.com>
2025-12-04 10:38:03 -05:00
Woosuk Kwon
cc050558f4
[Model Runner V2] Implement get_num_sampled_and_rejected kernel ( #30029 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-12-04 07:19:42 -08:00
Harry Mellor
5c32a06a04
Use Transformers v5 RoPE standardisation and validation ( #30046 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-04 14:54:28 +00:00
Yongtao Huang
dd97e047e0
Fix broken multiline assert in LoRAModelManager.register_module ( #30032 )
...
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com>
2025-12-04 22:04:42 +08:00
Harry Mellor
9998ea5b57
Delete HF version of Phi 4 MM ( #30049 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-04 13:44:50 +00:00
wang.yuqi
74c4d80c6c
[Model][6/N] Improve all pooling task | Support chunked prefill with ALL pooling ( #27145 )
...
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-04 13:44:15 +00:00
Kevin H. Luu
1b7c7f5159
[release] install regex ( #30008 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-04 03:18:29 -08:00
Chauncey
6796ce8bdb
[Bugfix] Fix the issue with interleaved thinking when using streaming ( #30033 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-12-04 11:11:59 +00:00
Andreas Karatzas
e96a6a6dca
[ROCm][CI][Bugfix] Fixing the Multi-Modal Models Test (Extended) 1 group ( #30013 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-04 11:00:16 +00:00
Noa Neria
6366c098d7
Validating Runai Model Streamer Integration with S3 Object Storage ( #29320 )
...
Signed-off-by: Noa Neria <noa@run.ai>
2025-12-04 18:04:43 +08:00
dtc
842aba501d
[P/D] Introduce Mooncake Transfer Engine as kv_connector ( #24718 )
...
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: dtc <dtcccc@linux.alibaba.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
2025-12-04 09:51:36 +00:00
rasmith
f2f4cea6cc
[CI/Build][AMD] Skip test on test_hybrid_attention_mamba_tensor_shapes on ROCm, requires FLASHINFER ( #29995 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-12-04 09:30:22 +00:00
Arpit Khandelwal
dfdda96747
[Core] Remove forced None assignment for deprecated PassConfig flags ( #29994 )
...
Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-04 09:15:04 +00:00
Xu Wenqing
ffdd18111b
Add DeepSeek-V3.2 tool parser. ( #29848 )
...
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
2025-12-04 08:46:34 +00:00
Ye (Charlotte) Qi
b8a6ae4158
[ROCm] add fallback for aiter fp8 decode mla ( #30005 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-12-04 08:45:57 +00:00
Mark McLoughlin
899e2ef558
[Core] Fix standalone runs of test_reset_prefix_cache_e2e ( #29899 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-12-04 16:22:03 +08:00
Cyrus Leung
68eb5c8d97
[Misc] Move functions into PoolingMetadata ( #30027 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-04 08:21:19 +00:00
Micah Williamson
5430e110c0
[CI][AMD] Match Main CI Behavior By Skipping test_eplb_spec_decode In AMD CI ( #30006 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-12-04 16:20:54 +08:00
TJian
3f1b03739a
[ROCm] [Bugfix] compute_attn_mask_seqlen for qwen3 omni ( #29974 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-12-04 08:20:24 +00:00
Charlie Fu
9aa33a74b0
[Rocm][CI] Fix test_speculator_eagle3 by skipping the CompressedTensorw4a16 Model ( #30001 )
...
Signed-off-by: charlifu <charlifu@amd.com>
Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com>
2025-12-04 07:52:28 +00:00
CYJiang
fd68e909db
[docs] Remove _total from counter metrics names ( #30028 )
...
In Prometheus Counters always expose their actual numeric value with a metric name that ends in _total. We should document the base name, as this what appears in the get_metrics() API.
Signed-off-by: CYJiang <86391540+googs1025@users.noreply.github.com>
2025-12-04 07:46:15 +00:00
daniel-salib
404fc4bfc0
[Frontend] refactor harmony utils output message parsing ( #29820 )
...
Signed-off-by: Daniel Salib <danielsalib@meta.com>
2025-12-04 15:36:57 +08:00
Chauncey
82a64b3d8f
[Bugfix] fixed deepseekv32 tool calling error ( #30025 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-04 15:12:12 +08:00
Cyrus Leung
9ae2f60374
[Misc] Various cleanups for MM input processing ( #29970 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-04 06:22:20 +00:00
Jianwei Mao
80f8af4b2f
Fix error while downloading dependencies for CPU backend ( #29797 )
...
Signed-off-by: Jianwei Mao <maojianwei2016@126.com>
2025-12-04 06:04:44 +00:00
Kuntai Du
8aaa81b35f
[KVConnector] remove unused code (the model aware kv ops class) ( #29709 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
2025-12-04 06:00:52 +00:00
Benjamin Bartels
fca3f46658
[Frontend] Fixes anthropic /v1/messages streaming not containing input_tokens on first chunk ( #29971 )
...
Signed-off-by: bbartels <benjamin@bartels.dev>
2025-12-04 05:50:27 +00:00
gausah01
28097d5638
[Bugfix][CPU] Fix CPU KV cache fallback memory allocation ( #29604 )
...
Signed-off-by: Gauri Sahnan <gauri.sahnan@arm.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2025-12-04 13:01:15 +08:00
Jee Jee Li
dd38ba3a26
[Bugfix] Fix adapter_enabled IMA ( #29977 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-04 12:51:15 +08:00
Li Wang
5f91cdda75
[Misc] Add docker build env for Ascend NPU ( #30015 )
...
Signed-off-by: wangli <wangli858794774@gmail.com>
2025-12-03 19:53:00 -08:00
Iceber Gu
33a3d6c798
fix LoRA-related examples ( #29956 )
...
Signed-off-by: Iceber Gu <caiwei95@hotmail.com>
2025-12-04 11:48:30 +08:00
Zhewen Li
c493b9d092
[CI/Build] Add MM code path to Examples Test ( #29986 )
...
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-12-03 19:21:45 -08:00
Xieyang Xu
ad32e3e19c
enable multi-node in external launcher mode ( #29833 )
2025-12-03 17:02:02 -08:00
Shengqi Chen
1109f98288
[CI] fix docker image build by specifying merge-base commit id when downloading pre-compiled wheels ( #29930 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
2025-12-03 14:08:19 -08:00
Elizabeth Thomas
b5407869c8
[Bugfix] Respect VLLM_CONFIGURE_LOGGING value ( #28671 )
...
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Jane Xu <janeyx@meta.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Johnny Yang <johnnyyang@google.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: bruceszchen <bruceszchen@tencent.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Johnny Yang <24908445+jcyang43@users.noreply.github.com>
2025-12-03 22:00:52 +00:00
bnellnm
2902c34826
[Kernels] Remove BatchedTritonOrDeepGemmExperts and default fallback to Triton ( #29929 )
...
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: bnellnm <49004751+bnellnm@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-12-03 20:49:00 +00:00
Wentao Ye
ac1886588f
[CI] Fix re import error ( #29973 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-12-03 15:16:54 -05:00
Yongtao Huang
2fc5d6e0d7
Fix LLMEngine.del dp_group cleanup condition ( #29954 )
...
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com>
2025-12-03 12:14:44 -08:00
elvischenv
afe9eb408e
[Bugfix] Fix flashinfer ar+norm kernel not available issue ( #29960 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2025-12-03 18:50:53 +00:00
Varun Sundar Rabindranath
19bee6d12d
[Performance][DP/EP] Add silu_mul_per_token_group_quant_fp8_colmajor kernel ( #29470 )
...
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-12-03 18:04:59 +00:00
avigny
dd5d1ef780
[Bugfix] Mistral tool parser streaming update ( #19425 )
...
Signed-off-by: avigny <47987522+avigny@users.noreply.github.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Co-authored-by: Jeff Cook <jeff@jeffcook.io>
Co-authored-by: sfbemerk <benjaminmerkel@mail.de>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-03 17:45:31 +00:00
Micah Williamson
d1f7392c5f
[ROCm][CI] Fix v1/logits_processors failure on ROCm ( #29927 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-12-04 01:17:07 +08:00
Yu Jiaqi
9ae3c55b10
SigLIP example add chat_template ( #29902 )
...
Signed-off-by: piood <2477084691@qq.com>
2025-12-03 16:12:58 +00:00
Lumis Chen
9bcf92295a
[Core] Add xxHash as a high-performance hash option for accelerating prefix caching ( #29163 )
...
Signed-off-by: LuminolT <lumischen01@gmail.com>
Signed-off-by: Lumis Chen <lumischen01@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
2025-12-03 16:06:57 +00:00
rasmith
5aa9b09040
[CI/Build][AMD] Skip test_shared_storage_connector_hashes in test_shared_storage_connector.py due to hipErrorLaunchFailure when calling .cpu() ( #29839 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-12-03 22:56:35 +08:00