Hank_
6482e3895b
chores: adjust the attn register param order ( #30688 )
...
Signed-off-by: Hank <hcc.mayday@gmail.com>
2025-12-17 19:58:16 +08:00
Qiu
a11f4a81e0
[Misc][PCP&DCP] relocate PCP feature check ( #30050 )
...
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-11 03:36:18 -08:00
Cyrus Leung
5a87d8b9b1
[Deprecation] Remove deprecated plugin and compilation fields for v0.13 release ( #30396 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-12-10 19:59:35 -08:00
Isotr0py
b952f4d3c3
[v1] Add PrefixLM support to FlexAttention backend ( #27938 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-12-07 15:51:36 +00:00
Matthew Bonanni
66e674cdd5
[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments ( #26315 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
2025-12-05 09:48:43 -08:00
Matthew Bonanni
fc1d8be3dc
[Attention] Update attention imports ( #29540 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-27 11:19:09 -05:00
Matthew Bonanni
430dd4d9eb
[Attention] Remove imports from vllm/attention/__init__.py ( #29342 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-26 10:53:15 -07:00
Roger Wang
0ff70821c9
[Core] Deprecate xformers ( #29262 )
...
Signed-off-by: Roger Wang <hey@rogerw.io>
2025-11-24 04:18:55 +00:00
Nicolò Lucchesi
066209a045
[Attention] Refactor FA block_size limitations to hybrid models only ( #29084 )
...
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-11-22 06:38:44 -08:00
Matthew Bonanni
11857a00b0
[Attention] Add ROCM_AITER_MLA_SPARSE to attention backend registry ( #29103 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-11-20 20:24:43 -08:00
Or Ozeri
647464719b
[KVConnector][Core] Support cross-layer KV blocks ( #27743 )
...
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2025-11-20 19:09:59 +01:00
Qiang Zhang
3fb0d90999
[AMD] Use Decoupled Kernel Block Size to Support AITER MLA block_size=1 ( #27715 )
...
Signed-off-by: chiangzhang <chiangzhang@tencent.com>
2025-11-20 02:11:52 +00:00
Qiu
2fd893b4ce
[Feature] Prefill Context Parallel (PCP) basic support ( #28718 )
...
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com>
Signed-off-by: LookAround <lixushi@huawei.com>
Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com>
Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Co-authored-by: FENP <yuanyongjie.yyj@antgroup.com>
Co-authored-by: LookAround <lixushi@huawei.com>
Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com>
Co-authored-by: zhenwenqi2024 <zhenwenqi_2022@qq.com>
Co-authored-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>
2025-11-19 15:52:44 -05:00
Aleksandr Malyshev
ac10fd3c69
Upstreaming aiter triton attention backend as a new backend ( #28701 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
2025-11-19 19:59:30 +00:00
Shanshan Shen
d44e9df7d4
[Model][Mamba] Add selector for mamba attention backend and make it pluggable for other device ( #26487 )
...
Signed-off-by: shen-shanshan <467638484@qq.com>
2025-11-19 16:24:55 +00:00
Huamin Li
07a606aa7e
[CI Failure] Fix backend selection for encoder-only models ( #28534 )
...
Signed-off-by: Huamin Li <3ericli@gmail.com>
2025-11-13 10:11:27 -05:00
Li, Jiang
7f829be7d3
[CPU] Refactor CPU attention backend ( #27954 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-11-12 09:43:06 +08:00
Matthew Bonanni
b30dfa03c5
[Attention] Refactor CUDA attention backend selection logic ( #24794 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-11-11 07:40:44 -05:00
Lucas Wilkinson
e8697faf03
[V0 deprecation] Remove no longer used get_metadata_cls ( #28370 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2025-11-10 14:32:09 +08:00
Cyrus Leung
4d4d6bad19
[Chore] Separate out vllm.utils.importlib ( #27022 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-10-17 00:48:59 +00:00
rongfu.leng
5afd3276df
[Feature] Add process_weights_after_loading to AttentionImpl ( #26870 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
2025-10-16 08:02:30 -07:00
Adrian Abeyta
0a9ef0cfce
Move query quantization to attention layer for Flashinfer & Triton. ( #26534 )
...
Signed-off-by: adabeyta <aabeyta@redhat.com>
Signed-off-by: Adrian Abeyta <aabeyta@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-10-15 19:01:38 -04:00
Harry Mellor
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y ( #26633 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-12 09:51:31 -07:00
Zhiyuan Li
d24cf322e1
[Hybrid]: Decouple Kernel Block Size from KV Page Size ( #24486 )
...
Signed-off-by: lizhiyuan <uniartisan2017@gmail.com>
Signed-off-by: Zhiyuan Li <uniartisan2017@gmail.com>
2025-10-08 23:43:39 -07:00
Naveenraj Kamalakannan
e614ab7806
Separate MLAAttention class from Attention ( #25103 )
...
Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-10-08 17:11:11 -07:00
Matthew Bonanni
2a03f93de9
[Attention] Register FLASHMLA_SPARSE ( #26441 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-10-08 22:28:52 +00:00
Matthew Bonanni
76879cc160
[Attention] Implement universal BACKEND_MAP ( #25900 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-10-08 12:00:25 -07:00
Gregory Shtrasberg
f231e5bc21
[ROCm] Split AITER unified attention into its own backend ( #25507 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-10-06 22:49:23 +00:00
Harry Mellor
1c0c68202c
Fix per file ruff ignores related to typing ( #26254 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 16:37:55 +00:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort ( #26247 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
Matthew Bonanni
2aaa423842
[Attention] Move Backend enum into registry ( #25893 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-10-02 20:32:24 -07:00
Yongye Zhu
fa7e254a7f
[New Model] DeepSeek-V3.2 (Rebased to Main) ( #25896 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
Signed-off-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Co-authored-by: Lucia Fang <fanglu@meta.com>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Xiaozhu Meng <mxz297@gmail.com>
Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
2025-09-30 17:14:41 +08:00
Jonas M. Kübler
69a8c8e99a
[torch.compile] Make Query Quantization Fusable ( #24914 )
...
Signed-off-by: Jonas Kuebler <kuebj@amazon.com>
2025-09-25 09:25:12 -04:00
Woosuk Kwon
e6750d0b18
[V0 Deprecation] Remove unused classes in attention ( #25541 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-24 13:24:40 -07:00
Woosuk Kwon
2e19a848d4
[V0 Deprecation] Remove max_seq_len_to_capture ( #25543 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-24 01:51:39 -07:00
Thomas Parnell
969b4da3a6
[V0 Deprecation] Remove placeholder attn ( #25510 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-09-23 22:12:14 +00:00
Cyrus Leung
417a164af6
[Misc] Remove unused encoder-decoder error strings ( #25374 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-22 11:04:32 +00:00
Cyrus Leung
f92d952632
[V0 Deprecation] Remove MultiModalPlaceholderMap ( #25366 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-09-22 08:49:19 +00:00
Woosuk Kwon
bc6e542d9f
Remove V0 attention backends ( #25351 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-21 16:03:28 -07:00
Woosuk Kwon
1cd885bd54
[V0 Deprecation] Remove V0 model runner base & simplify worker base ( #25328 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-20 20:49:09 -07:00
Woosuk Kwon
c99db8c8dd
[V0 Deprecation] Remove V0 core ( #25321 )
...
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-20 19:58:26 -07:00
Harry Mellor
12aed7e453
Encoder model support for the Transformers backend ( #25174 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-09-19 19:15:22 +01:00
Wentao Ye
b42566f440
[Bug] Fix is_flashmla_supported Check Error ( #24774 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-09-15 20:10:55 -06:00
Matthew Bonanni
7ba32aa60b
[Attention][FlashInfer] Enable FP8 FlashInfer (TRTLLM) MLA decode ( #24705 )
...
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
2025-09-12 15:45:53 -06:00
Didier Durand
bcb06d7baf
[Doc]: fix typos in various files ( #24726 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-12 06:43:12 -07:00
TaehyunKim
9bd831f501
[Model] New model support for Motif-1-Tiny ( #23414 )
...
Signed-off-by: ca1207 <ca1207zzz@gmail.com>
Signed-off-by: TaehyunKim <73943231+ca1207@users.noreply.github.com>
Co-authored-by: WyldeCat <skan1543@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-09-10 23:29:40 -07:00
Wentao Ye
15de5ff9ea
[Feature] Disallow FlashMLA on Blackwell ( #24521 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-09 14:59:34 -04:00
Didier Durand
f4962a6d55
[Doc]: fix typos in Python comments ( #24417 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-08 00:22:16 -07:00
youkaichao
558f0907dc
[attention][DCP] use AttentionImpl.need_to_return_lse_for_decode ( #24372 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-09-07 01:18:59 +00:00
Didier Durand
83609ca91d
[Doc]: fix typos in Python comments ( #24173 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-04 08:52:17 -07:00