Nick Hill
8e7a891602
[BugFix] Fix spec decoding max_tokens scheduling perf issue ( #29542 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-28 20:52:23 +08:00
EanWang211123
37b15e97e8
[Multimodal][Speculative Decoding]Eagle3 mm support, enablement on qwen3vl ( #29594 )
...
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: EanWang211123 <wangyiheng@sangfor.com.cn>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-11-27 22:05:45 -08:00
maang-h
c7ba1f6bc7
[BugFix] Fix ValueError in NewRequestData repr methods ( #29392 )
...
Signed-off-by: maang <maang_h@163.com>
2025-11-28 13:42:30 +08:00
Matthew Bonanni
fc1d8be3dc
[Attention] Update attention imports ( #29540 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-27 11:19:09 -05:00
Ryan Rock
bab438ff3e
[CI/Build] Skip ray tests on ROCm ( #29556 )
...
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
2025-11-27 07:01:37 -08:00
Cyrus Leung
e6d4f3c254
[Bugfix] Fix pre-commit ( #29601 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-27 02:23:06 -08:00
Morrison Turnansky
0838b52e2e
[Frontend][torch.compile] CompilationConfig Overhaul ( #20283 ): Set up -O infrastructure ( #26847 )
...
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: adabeyta <aabeyta@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Co-authored-by: adabeyta <aabeyta@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-27 01:55:58 -08:00
Micah Williamson
43c5792592
[ROCm][CI] Fix test_cpu_offloading for ROCm ( #29548 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-11-27 07:54:44 +00:00
Lucas Wilkinson
56539cddac
[Core] Refactor padding logic and pad for CUDA graphs before attention metadata building ( #28579 )
2025-11-26 14:07:13 -05:00
Matthew Bonanni
430dd4d9eb
[Attention] Remove imports from vllm/attention/__init__.py ( #29342 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-26 10:53:15 -07:00
Wentao Ye
0b0aa874e8
[Perf] Optimize batch invariant BMM, 18.1% Throughput improvement, 10.7% TTFT improvement ( #29345 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-26 09:38:52 -07:00
Nick Hill
4e57c6587f
[Core] Support logprobs with spec decode + async scheduling ( #29223 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-25 12:55:24 -08:00
Yifan Qiao
48ddb02b79
[Hybrid Allocator] Support KV cache groups with different block_size ( #29143 )
...
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
2025-11-25 10:30:57 -05:00
wang.yuqi
67fc16cd8c
[Bugfix] If chunked_prefill is disabled, end the scheduling early. ( #28911 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2025-11-25 16:06:09 +08:00
Micah Williamson
ef1f7030f0
[ROCm][CI] Fix test_cudagraph_mode failure in AMD CI ( #29367 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2025-11-25 07:55:09 +00:00
Rémi Delacourt
12c007e288
EAGLE Support DP>1 ( #26086 )
...
Signed-off-by: Rémi Delacourt <remi@mistral.ai>
Signed-off-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com>
Signed-off-by: remi <remi@mistral.ai>
2025-11-25 07:32:21 +00:00
vllmellm
64deead719
[Bugfix] [ROCm] [UX]: revert Flex attention backend ( #29371 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-11-25 06:56:06 +00:00
Harry Mellor
316c8492bf
Scheduled removal of guided_* config fields ( #29326 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-11-25 05:24:05 +00:00
Chen Zhang
71df2a57ef
[Hybrid Allocator] Better layer padding strategy for gpt-oss eagle ( #29303 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-11-24 14:28:32 -08:00
vllmellm
e48b2e6848
[Bugfix] [ROCm] [UX] Reorganize ROCm Backend Selection Logic ( #26980 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-11-24 15:24:49 +00:00
rasmith
3999442f1c
[CI/Build][AMD] Add check for flash_att_varlen_func to test_tree_attention.py ( #29252 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-11-23 04:45:08 +00:00
rasmith
71362ffab4
[CI/Build][AMD] Skip test_multi_shared_storage_connector_consistency in test_multi_connector.py due to hipErrorLaunchFailure when calling .cpu() ( #29253 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-11-23 04:42:49 +00:00
Nick Hill
7df331c66b
[BugFix] Fix chunked prompt logprobs + preemption ( #29071 )
2025-11-22 16:07:18 -05:00
Nick Hill
d44a63c6d6
[BugFix] Fix returned logprobs with spec decode + prefill chunking ( #29216 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-22 22:41:25 +08:00
Nicolò Lucchesi
066209a045
[Attention] Refactor FA block_size limitations to hybrid models only ( #29084 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-11-22 06:38:44 -08:00
Cyrus Leung
5a4802588e
[Misc] Further clean up chunked prefill and prefix caching init ( #29186 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-22 19:34:15 +08:00
rasmith
8e22da1d7f
[CI/Build Don't add FLASHINFER backend in test_cpu_offloading.py ( #29229 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-11-22 11:00:54 +00:00
rasmith
a4fdf2405c
[CI/Build] Skip tests that require libcudart in test_lmcache_integration.py ( #29228 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-11-22 10:59:39 +00:00
Mark McLoughlin
c6fa3895e9
[KV Connector] Fix async connector prefix cache metrics ( #28585 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
2025-11-21 17:45:00 -05:00
Julien Denize
57430fc95c
Default model load/config/tokenizer to mistral format if relevant files exist ( #28659 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-11-21 13:58:59 -08:00
Wentao Ye
1f400c58b8
[CI] Add batch invariant test to ci ( #27842 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-21 09:20:33 -07:00
WeiQing Chen
b34129bf8e
[Misc] remove useless v1 env ( #29164 )
...
Signed-off-by: David Chen <530634352@qq.com>
2025-11-21 01:41:20 -08:00
Jialin Ouyang
30b9c67743
Revert "[Redo] #26368 ( #28771 )" ( #29121 )
...
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
2025-11-20 21:27:45 -08:00
Cyrus Leung
56e96b37e4
[V0 Deprecation] Remove best_of ( #29090 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-11-21 11:40:40 +08:00
rasmith
c7a29d2c8d
[CI/Build] Remove skip global cleanup in test_struct_output_generate.py ( #29022 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-11-20 21:44:37 +00:00
rasmith
8237ab8a2b
[CI/Build] Skip lm-format-enforcer tests in test_struct_output_generate.py for now ( #29021 )
...
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
2025-11-20 21:35:14 +00:00
Or Ozeri
647464719b
[KVConnector][Core] Support cross-layer KV blocks ( #27743 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-11-20 19:09:59 +01:00
Or Ozeri
c0c2dd1e0b
[BugFix] kv_offloading: Fix bug in loading of partial cpu blocks ( #28951 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-11-20 18:55:10 +08:00
Wentao Ye
2c52c7fd9a
[Bug] Fix torch dynamo warning Dynamo detected a call to a functools.lru_cache ( #29038 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-20 16:52:23 +08:00
Benjamin Chislett
fcbcba6c70
[Feat] Iteration-level profiling for Torch and CUDA profiler ( #28987 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-19 19:17:48 -08:00
Wentao Ye
1607e664f0
[Bug] Fix Batch Invariant MLA test ( #28967 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-19 21:18:32 +00:00
Qiu
2fd893b4ce
[Feature] Prefill Context Parallel (PCP) basic support ( #28718 )
...
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com>
Signed-off-by: LookAround <lixushi@huawei.com>
Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com>
Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Co-authored-by: FENP <yuanyongjie.yyj@antgroup.com>
Co-authored-by: LookAround <lixushi@huawei.com>
Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com>
Co-authored-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Co-authored-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>
2025-11-19 15:52:44 -05:00
Didier Durand
09540cd918
[Doc]: fix typos in various files ( #29010 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-11-19 04:56:21 -08:00
Chendi.Xue
c3e2978620
[NIXL] fix cpu PD after physical <> logical block_size PR ( #28904 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
2025-11-18 14:03:23 -05:00
Kevin H. Luu
c64c0b78de
[chore] Move the rest of wikimedia url to S3 ( #28921 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-18 09:44:18 -08:00
Nicolò Lucchesi
f226a3f0c1
[CI][NIXL] Change default block_size for tests ( #28927 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-11-18 09:22:30 -08:00
Nick Hill
5bdd155277
[CI] Fix async scheduling + spec decoding test flake ( #28902 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-18 05:26:32 +00:00
Wentao Ye
a289cc1dde
[Test] Batch Invariant: Rename and organize tests ( #27421 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-17 18:09:47 -05:00
Ronald
d8874c61a5
[Core] Async Scheduling X Spec Decoding Compatibility ( #24799 )
...
Signed-off-by: Ronald1995 <ronaldautomobile@163.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
2025-11-17 12:16:20 -08:00
Nick Hill
80b6080ddc
[BugFix] Fix async scheduling + chunked prefill + preemption ( #28787 )
...
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-11-17 06:46:46 +08:00