11968 Commits

Author SHA1 Message Date
Yanan Cao
62b3333448
[Frontend] Remove deprecated -O.xx flag (#29991)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
2025-12-05 00:47:22 -08:00
rasmith
feecba09af
[CI/Build][AMD] Use float16 in test_reset_prefix_cache_e2e to avoid accuracy issues (#29997)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-12-05 08:42:25 +00:00
amitz-nv
6038b1b04b
[Frontend][Model] Add 'float16' to possible mamba cache dtype values, override mamba SSM cache dtype value for NemotronH (#29978)
Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
2025-12-05 00:34:33 -08:00
Tiger Xu / Zhonghu Xu
60a66ea2dc
[DOC]: Add kthena to integrations (#29931)
Signed-off-by: Zhonghu Xu <xuzhonghu@huawei.com>
2025-12-05 08:11:03 +00:00
Micah Williamson
06579f9a82
[AMD][CI] Add ray[default] Dependency On ROCm To Pass v1/metrics/test_engine_logger_apis.py (#30110)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-12-05 06:48:23 +00:00
Chukwuma Nwaugha
6e865b6a83
Refactor example prompts fixture (#29854)
Signed-off-by: nwaughac@gmail.com
2025-12-05 06:44:32 +00:00
Jingchun Gao
d698bb382d
[Bugfix] Correct num_q_heads on DCP for Flashinfer backends (#29487)
Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com>
Signed-off-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>
Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com>
2025-12-05 05:54:31 +00:00
Charlie Fu
2c22c4ca2d
[ROCm][CI] Increase the memory threshold for test_deep_sleep_fp8_kvcache (#30104)
Signed-off-by: charlifu <charlifu@amd.com>
2025-12-05 04:51:44 +00:00
Laith Sakka
5867819eaf
Do not guard during noop elimination pass (#30095)
Signed-off-by: Laith Sakka <lsakka@meta.com>
2025-12-05 04:10:12 +00:00
Charlie Fu
7c9b2c8f81
[ROCm][CI] Add jiwer dependency for testing (#30081)
Signed-off-by: charlifu <charlifu@amd.com>
2025-12-05 03:34:51 +00:00
Qiu
0098a6e3da
[PCP&DCP] move CUDAGraph check for PCP&DCP to the check func of platforms (#29952)
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-04 21:40:51 -05:00
Hubert de La Jonquiere
befb59e5b1
[Model] Add Holo2 reasoning parser (#30048)
Signed-off-by: hdlj-h <hubert@hcompany.ai>
2025-12-05 10:38:45 +08:00
Shengqi Chen
aaddc9c82a
[CI] fix silent error in nightly wheel index generation script, add generation time to HTML index (#30060)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
2025-12-05 00:48:59 +00:00
Zhewen Li
263c38d74d
[CI/Build] Update batch invariant test trigger (#30080)
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-12-05 00:42:37 +00:00
Zhewen Li
bcf43ab1f3
[CI/Build][AMD] Add Llama4 Maverick FP8 to AMD CI (#28695)
Signed-off-by: zhewenli <zhewenli@meta.com>
2025-12-04 16:07:20 -08:00
Alexander Matveev
4470ee2f90
[Perf] Enable separate shared_experts stream only for CUDA (#30085)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
2025-12-05 00:03:17 +00:00
TimWang
690cc3ef20
docs: update metrics design doc to use new vllm:kv_cache_usage_perc (#30041)
Signed-off-by: Tim <tim.wang03@sap.com>
2025-12-04 23:37:14 +00:00
Laith Sakka
1f0d184590
[aot_compile]change VLLM backend to read fake args from example_value (#29104)
Signed-off-by: Laith Sakka <lsakka@meta.com>
2025-12-04 17:33:45 -05:00
Lucas Wilkinson
c8ab988b15
[BugFix] Fix DBO assert assert B_block_table == B_q (#29933)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-12-04 14:48:54 -05:00
Peng-YM
48a5fff66e
[Bugfix] Missing tokens in return_token_ids when tool parsers is enabled in streaming mode (#29074)
Signed-off-by: Peng-YM <1048217874pengym@gmail.com>
2025-12-04 19:09:39 +00:00
Mercykid-bash
1119f6e47a
Abstract eplb algo (#26471)
Signed-off-by: Che Ruan <cr623@ic.ac.uk>
Signed-off-by: mengxingkongzhouhan <117415539+mengxingkongzhouhan@users.noreply.github.com>
Signed-off-by: Mercykid-bash <ruanche0218@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Che Ruan <cr623@ic.ac.uk>
Co-authored-by: mengxingkongzhouhan <117415539+mengxingkongzhouhan@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-04 19:09:09 +00:00
Harry Mellor
e10c84e06a
Access partial_rotary_factor from rope_parameters (#29966)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-04 18:42:49 +00:00
Kuntai Du
ece2825a29
[KVConnector] Remove v0-related kv connector components such as kv pipe and kv lookup buffer (#29705)
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
2025-12-04 18:20:48 +00:00
Jee Jee Li
652ba93da3
[Bugfix] Fix FP8 MoE LoRA (#29890)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-12-04 18:17:49 +00:00
Tao Yun
6dcb07f676
support qwen3-vl handle requests with embeddings (#30037)
Signed-off-by: taoyun <1069423820@qq.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-12-04 17:34:06 +00:00
Qiu
46cbbca05c
[CI][DCP][Perf] reduce DCP CI execution time (#29858)
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
2025-12-04 17:28:21 +00:00
Cyrus Leung
b286a311c2
[Chore] Deprecate merge_by_field_config arg (#30035)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-04 17:21:24 +00:00
Shengqi Chen
990f806473
[Doc] clarify nightly builds in developer docs (#30019)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
2025-12-05 00:28:37 +08:00
Doug Smith
5b4b42c0b6
Mark DBO test as flaky on b200 for Distributed B200 test (#29913)
Signed-off-by: dougbtv <dosmith@redhat.com>
2025-12-04 10:38:03 -05:00
Woosuk Kwon
cc050558f4
[Model Runner V2] Implement get_num_sampled_and_rejected kernel (#30029)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-12-04 07:19:42 -08:00
Harry Mellor
5c32a06a04
Use Transformers v5 RoPE standardisation and validation (#30046)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-04 14:54:28 +00:00
Yongtao Huang
dd97e047e0
Fix broken multiline assert in LoRAModelManager.register_module (#30032)
Signed-off-by: Yongtao Huang <yongtaoh2022@gmail.com>
2025-12-04 22:04:42 +08:00
Harry Mellor
9998ea5b57
Delete HF version of Phi 4 MM (#30049)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-12-04 13:44:50 +00:00
wang.yuqi
74c4d80c6c
[Model][6/N] Improve all pooling task | Support chunked prefill with ALL pooling (#27145)
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-04 13:44:15 +00:00
Kevin H. Luu
1b7c7f5159
[release] install regex (#30008)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-04 03:18:29 -08:00
Chauncey
6796ce8bdb
[Bugfix] Fix the issue with interleaved thinking when using streaming (#30033)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-12-04 11:11:59 +00:00
Andreas Karatzas
e96a6a6dca
[ROCm][CI][Bugfix] Fixing the Multi-Modal Models Test (Extended) 1 group (#30013)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-04 11:00:16 +00:00
Noa Neria
6366c098d7
Validating Runai Model Streamer Integration with S3 Object Storage (#29320)
Signed-off-by: Noa Neria <noa@run.ai>
2025-12-04 18:04:43 +08:00
dtc
842aba501d
[P/D] Introduce Mooncake Transfer Engine as kv_connector (#24718)
Signed-off-by: Tianchen Ding <dtcccc@linux.alibaba.com>
Signed-off-by: dtc <dtcccc@linux.alibaba.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
2025-12-04 09:51:36 +00:00
rasmith
f2f4cea6cc
[CI/Build][AMD] Skip test on test_hybrid_attention_mamba_tensor_shapes on ROCm, requires FLASHINFER (#29995)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-12-04 09:30:22 +00:00
Arpit Khandelwal
dfdda96747
[Core] Remove forced None assignment for deprecated PassConfig flags (#29994)
Signed-off-by: arpitkh101 <arpit5khandelwal@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-04 09:15:04 +00:00
Xu Wenqing
ffdd18111b
Add DeepSeek-V3.2 tool parser. (#29848)
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>
2025-12-04 08:46:34 +00:00
Ye (Charlotte) Qi
b8a6ae4158
[ROCm] add fallback for aiter fp8 decode mla (#30005)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-12-04 08:45:57 +00:00
Mark McLoughlin
899e2ef558
[Core] Fix standalone runs of test_reset_prefix_cache_e2e (#29899)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
2025-12-04 16:22:03 +08:00
Cyrus Leung
68eb5c8d97
[Misc] Move functions into PoolingMetadata (#30027)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-04 08:21:19 +00:00
Micah Williamson
5430e110c0
[CI][AMD] Match Main CI Behavior By Skipping test_eplb_spec_decode In AMD CI (#30006)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-12-04 16:20:54 +08:00
TJian
3f1b03739a
[ROCm] [Bugfix] compute_attn_mask_seqlen for qwen3 omni (#29974)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-12-04 08:20:24 +00:00
Charlie Fu
9aa33a74b0
[Rocm][CI] Fix test_speculator_eagle3 by skipping the CompressedTensorw4a16 Model (#30001)
Signed-off-by: charlifu <charlifu@amd.com>
Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com>
2025-12-04 07:52:28 +00:00
CYJiang
fd68e909db
[docs] Remove _total from counter metrics names (#30028)
In Prometheus Counters always expose their actual numeric value with a metric name that ends in _total. We should document the base name, as this what appears in the get_metrics() API.

Signed-off-by: CYJiang <86391540+googs1025@users.noreply.github.com>
2025-12-04 07:46:15 +00:00
daniel-salib
404fc4bfc0
[Frontend] refactor harmony utils output message parsing (#29820)
Signed-off-by: Daniel Salib <danielsalib@meta.com>
2025-12-04 15:36:57 +08:00