Benjamin Chislett
|
1986de1375
|
[Perf] Optimize EAGLE prepare_inputs_padded with triton kernels (#28597)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
|
2025-11-28 22:25:05 +00:00 |
|
Cyrus Leung
|
8d9338fae4
|
[Chore] Rename Processor to InputProcessor (#29682)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-28 09:35:41 -08:00 |
|
Nick Hill
|
8e7a891602
|
[BugFix] Fix spec decoding max_tokens scheduling perf issue (#29542)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-28 20:52:23 +08:00 |
|
EanWang211123
|
37b15e97e8
|
[Multimodal][Speculative Decoding]Eagle3 mm support, enablement on qwen3vl (#29594)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: EanWang211123 <wangyiheng@sangfor.com.cn>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-11-27 22:05:45 -08:00 |
|
maang-h
|
c7ba1f6bc7
|
[BugFix] Fix ValueError in NewRequestData repr methods (#29392)
Signed-off-by: maang <maang_h@163.com>
|
2025-11-28 13:42:30 +08:00 |
|
Matthew Bonanni
|
fc1d8be3dc
|
[Attention] Update attention imports (#29540)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-11-27 11:19:09 -05:00 |
|
Ryan Rock
|
bab438ff3e
|
[CI/Build] Skip ray tests on ROCm (#29556)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
|
2025-11-27 07:01:37 -08:00 |
|
Cyrus Leung
|
e6d4f3c254
|
[Bugfix] Fix pre-commit (#29601)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-27 02:23:06 -08:00 |
|
Morrison Turnansky
|
0838b52e2e
|
[Frontend][torch.compile] CompilationConfig Overhaul (#20283): Set up -O infrastructure (#26847)
Signed-off-by: morrison-turnansky <mturnans@redhat.com>
Signed-off-by: adabeyta <aabeyta@redhat.com>
Signed-off-by: Morrison Turnansky <mturnans@redhat.com>
Co-authored-by: adabeyta <aabeyta@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-27 01:55:58 -08:00 |
|
Micah Williamson
|
43c5792592
|
[ROCm][CI] Fix test_cpu_offloading for ROCm (#29548)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2025-11-27 07:54:44 +00:00 |
|
Lucas Wilkinson
|
56539cddac
|
[Core] Refactor padding logic and pad for CUDA graphs before attention metadata building (#28579)
|
2025-11-26 14:07:13 -05:00 |
|
Matthew Bonanni
|
430dd4d9eb
|
[Attention] Remove imports from vllm/attention/__init__.py (#29342)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-11-26 10:53:15 -07:00 |
|
Wentao Ye
|
0b0aa874e8
|
[Perf] Optimize batch invariant BMM, 18.1% Throughput improvement, 10.7% TTFT improvement (#29345)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-26 09:38:52 -07:00 |
|
Nick Hill
|
4e57c6587f
|
[Core] Support logprobs with spec decode + async scheduling (#29223)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-25 12:55:24 -08:00 |
|
Yifan Qiao
|
48ddb02b79
|
[Hybrid Allocator] Support KV cache groups with different block_size (#29143)
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
|
2025-11-25 10:30:57 -05:00 |
|
wang.yuqi
|
67fc16cd8c
|
[Bugfix] If chunked_prefill is disabled, end the scheduling early. (#28911)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2025-11-25 16:06:09 +08:00 |
|
Micah Williamson
|
ef1f7030f0
|
[ROCm][CI] Fix test_cudagraph_mode failure in AMD CI (#29367)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2025-11-25 07:55:09 +00:00 |
|
Rémi Delacourt
|
12c007e288
|
EAGLE Support DP>1 (#26086)
Signed-off-by: Rémi Delacourt <remi@mistral.ai>
Signed-off-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com>
Signed-off-by: remi <remi@mistral.ai>
|
2025-11-25 07:32:21 +00:00 |
|
vllmellm
|
64deead719
|
[Bugfix] [ROCm] [UX]: revert Flex attention backend (#29371)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-11-25 06:56:06 +00:00 |
|
Harry Mellor
|
316c8492bf
|
Scheduled removal of guided_* config fields (#29326)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-25 05:24:05 +00:00 |
|
Chen Zhang
|
71df2a57ef
|
[Hybrid Allocator] Better layer padding strategy for gpt-oss eagle (#29303)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-11-24 14:28:32 -08:00 |
|
vllmellm
|
e48b2e6848
|
[Bugfix] [ROCm] [UX] Reorganize ROCm Backend Selection Logic (#26980)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-11-24 15:24:49 +00:00 |
|
rasmith
|
3999442f1c
|
[CI/Build][AMD] Add check for flash_att_varlen_func to test_tree_attention.py (#29252)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-23 04:45:08 +00:00 |
|
rasmith
|
71362ffab4
|
[CI/Build][AMD] Skip test_multi_shared_storage_connector_consistency in test_multi_connector.py due to hipErrorLaunchFailure when calling .cpu() (#29253)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-23 04:42:49 +00:00 |
|
Nick Hill
|
7df331c66b
|
[BugFix] Fix chunked prompt logprobs + preemption (#29071)
|
2025-11-22 16:07:18 -05:00 |
|
Nick Hill
|
d44a63c6d6
|
[BugFix] Fix returned logprobs with spec decode + prefill chunking (#29216)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-22 22:41:25 +08:00 |
|
Nicolò Lucchesi
|
066209a045
|
[Attention] Refactor FA block_size limitations to hybrid models only (#29084)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-11-22 06:38:44 -08:00 |
|
Cyrus Leung
|
5a4802588e
|
[Misc] Further clean up chunked prefill and prefix caching init (#29186)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-22 19:34:15 +08:00 |
|
rasmith
|
8e22da1d7f
|
[CI/Build Don't add FLASHINFER backend in test_cpu_offloading.py (#29229)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-22 11:00:54 +00:00 |
|
rasmith
|
a4fdf2405c
|
[CI/Build] Skip tests that require libcudart in test_lmcache_integration.py (#29228)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-22 10:59:39 +00:00 |
|
Mark McLoughlin
|
c6fa3895e9
|
[KV Connector] Fix async connector prefix cache metrics (#28585)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2025-11-21 17:45:00 -05:00 |
|
Julien Denize
|
57430fc95c
|
Default model load/config/tokenizer to mistral format if relevant files exist (#28659)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-11-21 13:58:59 -08:00 |
|
Wentao Ye
|
1f400c58b8
|
[CI] Add batch invariant test to ci (#27842)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-21 09:20:33 -07:00 |
|
WeiQing Chen
|
b34129bf8e
|
[Misc] remove useless v1 env (#29164)
Signed-off-by: David Chen <530634352@qq.com>
|
2025-11-21 01:41:20 -08:00 |
|
Jialin Ouyang
|
30b9c67743
|
Revert "[Redo] #26368 (#28771)" (#29121)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-11-20 21:27:45 -08:00 |
|
Cyrus Leung
|
56e96b37e4
|
[V0 Deprecation] Remove best_of (#29090)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-21 11:40:40 +08:00 |
|
rasmith
|
c7a29d2c8d
|
[CI/Build] Remove skip global cleanup in test_struct_output_generate.py (#29022)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-20 21:44:37 +00:00 |
|
rasmith
|
8237ab8a2b
|
[CI/Build] Skip lm-format-enforcer tests in test_struct_output_generate.py for now (#29021)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-20 21:35:14 +00:00 |
|
Or Ozeri
|
647464719b
|
[KVConnector][Core] Support cross-layer KV blocks (#27743)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2025-11-20 19:09:59 +01:00 |
|
Or Ozeri
|
c0c2dd1e0b
|
[BugFix] kv_offloading: Fix bug in loading of partial cpu blocks (#28951)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-20 18:55:10 +08:00 |
|
Wentao Ye
|
2c52c7fd9a
|
[Bug] Fix torch dynamo warning Dynamo detected a call to a functools.lru_cache (#29038)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-20 16:52:23 +08:00 |
|
Benjamin Chislett
|
fcbcba6c70
|
[Feat] Iteration-level profiling for Torch and CUDA profiler (#28987)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-19 19:17:48 -08:00 |
|
Wentao Ye
|
1607e664f0
|
[Bug] Fix Batch Invariant MLA test (#28967)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-19 21:18:32 +00:00 |
|
Qiu
|
2fd893b4ce
|
[Feature] Prefill Context Parallel (PCP) basic support (#28718)
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com>
Signed-off-by: LookAround <lixushi@huawei.com>
Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com>
Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Co-authored-by: FENP <yuanyongjie.yyj@antgroup.com>
Co-authored-by: LookAround <lixushi@huawei.com>
Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com>
Co-authored-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Co-authored-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>
|
2025-11-19 15:52:44 -05:00 |
|
Didier Durand
|
09540cd918
|
[Doc]: fix typos in various files (#29010)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-11-19 04:56:21 -08:00 |
|
Chendi.Xue
|
c3e2978620
|
[NIXL] fix cpu PD after physical <> logical block_size PR (#28904)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2025-11-18 14:03:23 -05:00 |
|
Kevin H. Luu
|
c64c0b78de
|
[chore] Move the rest of wikimedia url to S3 (#28921)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-18 09:44:18 -08:00 |
|
Nicolò Lucchesi
|
f226a3f0c1
|
[CI][NIXL] Change default block_size for tests (#28927)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-11-18 09:22:30 -08:00 |
|
Nick Hill
|
5bdd155277
|
[CI] Fix async scheduling + spec decoding test flake (#28902)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-18 05:26:32 +00:00 |
|
Wentao Ye
|
a289cc1dde
|
[Test] Batch Invariant: Rename and organize tests (#27421)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-17 18:09:47 -05:00 |
|