10166 Commits

Author SHA1 Message Date
wuhang
91ac7f764d
[CI][gpt-oss] Enable python tool tests in CI (#24315)
Signed-off-by: wuhang <wuhang6@huawei.com>
2025-10-06 04:20:06 +00:00
Chen Zhang
4be7d7c1c9
[MISC] Add heheda12345 to CODEOWNERS of vllm/config/cache.py (#26270)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-10-06 10:58:59 +08:00
orangeng
59b477645c
[Doc] Edited minor typo (#26266)
Signed-off-by: Orange Ng <ngquanhao@outlook.com>
2025-10-05 19:53:09 -07:00
Thomas Parnell
778f554157
[V1] [Hybrid] Some additional clean-up in Mamba2 prefix caching (#26222)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-10-06 10:40:30 +08:00
Thomas Parnell
d3c84297c3
[CI] Add comment about the single cudagraph capture size that is used (#26252) 2025-10-06 02:35:37 +00:00
Elieser Pereira
f509a20846
[DOC] Update production-stack.md (#26177)
Signed-off-by: Elieser Pereira <elieser.pereiraa@gmail.com>
2025-10-05 21:32:48 +00:00
Michael Goin
60bc25e74c
[CI] Add Blackwell LM Eval Small Models test to nightly (#26052)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-10-05 14:59:50 -06:00
Harry Mellor
b893d661b1
Fix per file ruff ignores related to simplification (#26259)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 20:31:53 +00:00
Jason Li
6b6e98775f
[NVIDIA] flashinfer TRTLLM attention prefill token limit (#25998)
Signed-off-by: jasonlizhengjian <jason.li@centml.ai>
Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com>
2025-10-05 14:24:37 -06:00
Jiangyun Zhu
9c3c21c519
[CI] fix mamba kernel test (#26250)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-10-05 18:26:59 +00:00
Harry Mellor
512b8affa4
Update ruff pre-commit hooks version (#26255)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-10-05 09:50:50 -07:00
Harry Mellor
1c0c68202c
Fix per file ruff ignores related to typing (#26254)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 16:37:55 +00:00
ihb2032
5f317530ec
fix(tests): Resolve late binding of loop variable in assert message lambda (#26249)
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
Signed-off-by: ihb2032 <1355790728@qq.com
2025-10-05 09:18:22 -07:00
Harry Mellor
557b2e961d
Remove all cases of fmt: on/off (#26253)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 09:18:14 -07:00
Harry Mellor
4e256cadc2
Remove all references to yapf as it's no longer used (#26251)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 09:18:11 -07:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
Hank_
17edd8a807
[Platform][Kernel] platform-specific kernel loading (#25823)
Signed-off-by: Hank <hcc.mayday@gmail.com>
2025-10-05 13:25:15 +02:00
ihb2032
3303cfb4ac
[Bugfix][Hardware][RISC-V] Limit supported dtypes to float32 to avoid scheduler segfault (#26228)
Signed-off-by: lyd1992 <liuyudong@iscas.ac.cn>
Signed-off-by: ihb2032 <1355790728@qq.com>
2025-10-05 10:36:54 +00:00
Cyrus Leung
b7e8e4e6be
[Bugfix] Always apply MM processor even when no MM items are passed (#26240)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-05 10:10:20 +00:00
Simon Danielsson
432e1cbc23
[Bugfix]: Assertion error when using FlashInfer backend (#25933)
Signed-off-by: simondanielsson <simon.danielsson99@hotmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-10-05 16:46:36 +08:00
Jialin Ouyang
201c971e96
[Perf][Easy] Early stop in request_block_hasher (#26112)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-10-05 16:46:03 +08:00
Maximilien de Bayser
e0986ea07b
Add documentation for granite 4 tool calling (#26175)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2025-10-05 07:35:42 +00:00
Cyrus Leung
a964e5e6c3
[Bugfix] Allow --skip-tokenizer-init with echo and return_token_ids (#26238)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-05 05:38:53 +00:00
22quinn
78c1d5bfd2
[Easy] Add str repr for IterationStats (#26232)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-10-05 05:00:21 +00:00
Cyrus Leung
59a85c366e
[Model] Use merge_by_field_config for MM models (H-L) (#26230)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-05 11:54:17 +08:00
Cyrus Leung
119f00630b
[Renderer] Clean up renderer code (#26216)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-04 17:05:29 +00:00
Isotr0py
a42d2df75f
[Frontend] Cache chat template kwargs resolution (#26227)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-10-04 15:32:30 +00:00
Li, Jiang
5c057e068f
[CPU] Refine batch reorder of CPU attention backend (#26096)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-10-04 21:54:35 +08:00
Thomas Parnell
ed3aeb25a4
[V1] [Hybrid] Remove code to override default CUDA graph configuration (#26226)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-10-04 13:47:48 +00:00
yuafng
86ee949128
Fix tensor device and dtype placement in Qwen2VL model (#26219)
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Yuanfeng Li <yuanfengli@meta.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-10-04 06:41:39 -07:00
Cyrus Leung
4570535ec4
[Model] CLIP Embedding Support (#26010)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-04 06:21:42 -07:00
Nicolò Lucchesi
2a6dc67eb5
[Bugfix] Fix _reqs_to_process leak on abort (#26012)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-10-04 11:39:31 +00:00
Yannick Schnider
f05fea1f5e
[Core] Enable decode of context length equal to max model length (#26168)
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
2025-10-04 09:59:26 +00:00
Luca Soldaini
d0df145c2a
Add Olmo 3 reasoning parser (#26054)
Signed-off-by: Luca Soldaini <luca@soldaini.net>
2025-10-04 17:48:29 +08:00
Cyrus Leung
1838cd4860
Revert "Add batch invariant kernel override for FlashInfer backend [2/n]" (#26220) 2025-10-04 02:45:08 -07:00
Huamin Li
7d6b03381e
[CI Failure] fix_test_auto_prefix_cache_support (#26053)
Signed-off-by: Huamin Li <3ericli@gmail.com>
2025-10-04 02:44:49 -07:00
Cyrus Leung
7c2e91c4e0
[Misc] Remove unused executor.apply_model (#26215)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-04 01:45:53 -07:00
Cyrus Leung
736fbf4c89
[Misc] Require merge_by_field_config argument (#26214)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-04 01:40:14 -07:00
Cyrus Leung
44ea85137a
[Model] Support nested structures for TensorSchema (#26212)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-04 01:20:32 -07:00
Harry Mellor
d3d649efec
Support expert parallel in Transformers backend (#26162)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-10-04 04:35:04 +00:00
Stan Wozniak
ea507c3a93
[V1] [Hybrid] Mamba2 Automatic Prefix Caching (#25752)
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-10-04 06:34:22 +02:00
Fadi Arafeh
9705fba7b7
[cpu][perf] Accelerate unquantized-linear for AArch64 through oneDNN/ACL and weight prepack (#25948)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2025-10-04 12:16:38 +08:00
Bram Wasti
2f7dbc9b42
Add batch invariant kernel override for FlashInfer backend [2/n] (#25769)
Signed-off-by: Bram Wasti <bwasti@meta.com>
Signed-off-by: Bram Wasti <bwasti@fb.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-10-03 19:49:30 -07:00
Ben Browning
ea25a76c05
[BugFix] Use async Mistral Tokenizer in Chat Completions (#26134)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-10-04 09:42:08 +08:00
Roger Wang
67bc0c003e
[Bugfix] Fix qwen3 vl dummy data generation with overrides (#26193)
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-10-04 01:40:20 +00:00
Eugene Khvedchenya
5a05f26603
Fix issue of using only the part of video frame [Nemotron Nano] (#26186)
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
2025-10-04 00:21:00 +00:00
Varun Sundar Rabindranath
7ef40bb983
[GPTOSS][DP/EP][Marlin] Enable GPTOSS DP/EP using Marlin kernels (#25488)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-10-03 20:13:13 -04:00
Wentao Ye
767cbb011d
[CI] Fix Pre-commit Mypy Error (#26181)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-10-03 16:08:03 -07:00
Angela Yi
7cfa4b24bf
[BugFix] Fix de-functionalization pass for rotary_embedding (#23953)
Signed-off-by: angelayi <yiangela7@gmail.com>
2025-10-03 15:44:18 -07:00
Sergei Skvortsov
b71fcd4905
[Misc] Add penalties sampling parameters to serve tool (#25974)
Signed-off-by: Sergei Skvortsov <sergeyskv@nebius.com>
Co-authored-by: Sergei Skvortsov <sergeyskv@nebius.com>
2025-10-03 15:43:14 -07:00