xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-04-29 08:47:09 +08:00

Author	SHA1	Message	Date
Hank_	6482e3895b	chores: adjust the attn register param order (#30688 ) Signed-off-by: Hank <hcc.mayday@gmail.com>	2025-12-17 19:58:16 +08:00
Nicolò Lucchesi	e087fbc393	[MM] Pass FA version in ViT Attn (#30756 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-17 07:54:45 +08:00
TJian	2410132bb1	[ROCm] [Bugfix] Fix torch sdpa hallucination (#30789 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-12-16 15:32:43 -08:00
Lucas Wilkinson	9fec0e13d5	[Attention] Cache attention metadata builds across hybrid KV-cache groups (#29627 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Stanislaw Wozniak <stw@zurich.ibm.com>	2025-12-16 17:10:16 -05:00
Matthew Bonanni	51e5b3e3c4	[Bugfix] Fix ViT with FlashAttention on ROCm (#30703 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-12-15 19:45:21 +00:00
Isotr0py	ec154c36ee	[Platform] Refactor Platform attention backend selection to avoid breakpoint for OOT platform (#30212 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-15 17:36:07 +00:00
Shanshan Shen	87b4d1557d	[CustomOp][MM] Extract MMEncoderAttention as CustomOp and replace the backend of QwenVisionAttention with it. (#30125 ) Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-12-15 11:13:32 +08:00
Matthew Bonanni	86a3261525	[Bugfix] Pass FA version in `MultiHeadAttention` (#30575 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-12-13 00:02:11 +00:00
jvlunteren	9c0ee995a8	[Kernel] Support CUDA Graphs in 3D Triton Attention Kernel (#28306 ) Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com> Signed-off-by: jvlunteren <161835099+jvlunteren@users.noreply.github.com> Co-authored-by: Thomas Parnell <tom.parnell@gmail.com> Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-12-12 16:55:40 +01:00
Qiu	a11f4a81e0	[Misc][PCP&DCP] relocate PCP feature check (#30050 ) Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-11 03:36:18 -08:00
Cyrus Leung	5a87d8b9b1	[Deprecation] Remove deprecated plugin and compilation fields for v0.13 release (#30396 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-10 19:59:35 -08:00
Lucas Wilkinson	abe93bce59	[Attention] Make seq_lens_cpu optional in CommonAttentionMetadata to enable true async spec-decode (#29624 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-12-09 17:18:10 -08:00
rasmith	7618dc973d	[CI/Build] Make test_mha_attn.py run on correct platform only and check for flash_attn_varlen_func in layer.py (#29145 )	2025-12-09 20:18:17 +00:00
Wentao Ye	d9417096d1	[Feature] Batch invariant: Enable `TRITON_MLA` without prefix-caching (#29125 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-08 19:31:57 -05:00
Dazhi Jiang	bcb6f5947f	[Perf] Remove sync point in vit torch sdpa attn backend (#30232 ) Signed-off-by: Dazhi Jiang <dazhi_jiang@163.com>	2025-12-08 07:12:42 +00:00
Isotr0py	b952f4d3c3	[v1] Add PrefixLM support to FlexAttention backend (#27938 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-12-07 15:51:36 +00:00
Matthew Bonanni	66e674cdd5	[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-12-05 09:48:43 -08:00
Augusto Yao	9726e64530	bugfix: correct attn output with base 2 or e (#28840 ) Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>	2025-11-29 07:52:12 +08:00
Isotr0py	6f9d81d03b	[V0 deprecation] Clean up legacy paged attention helper functions (#28043 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-28 16:44:33 +00:00
Mingyuan Ma	460d8bbf2d	Remove upstream fa checks (#29471 ) Signed-off-by: mingyuanm <mingyuanm@nvidia.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-11-28 05:52:42 -08:00
Cyrus Leung	33b06a6f24	[Misc] Remove redundant attention var constants (#29650 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-28 04:35:19 -08:00
Matthew Bonanni	fc1d8be3dc	[Attention] Update attention imports (#29540 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-27 11:19:09 -05:00
Matthew Bonanni	430dd4d9eb	[Attention] Remove imports from `vllm/attention/__init__.py` (#29342 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-26 10:53:15 -07:00
Pleaplusone	d9d342d214	[Performance][MLA][ROCm] Remove redundant D2D copy in deepseek (#27457 ) Signed-off-by: ganyi <ygan@amd.com>	2025-11-26 12:45:28 +08:00
Nicolò Lucchesi	798e87db5c	[Core] Generalize Encoder-Decoder `seq_lens` computation to avoid Whisper hardcoded logic (#29268 ) Signed-off-by: NickLucche <nlucches@redhat.com> Co-authored-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>	2025-11-25 11:32:11 +00:00
Lucas Wilkinson	2d9ee28cab	[CI/Test Fix] Fix CP tests on Blackwell (#29338 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-24 20:55:57 -08:00
Roger Wang	0ff70821c9	[Core] Deprecate `xformers` (#29262 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-11-24 04:18:55 +00:00
Nicolò Lucchesi	066209a045	[Attention] Refactor FA `block_size` limitations to hybrid models only (#29084 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-11-22 06:38:44 -08:00
Matthew Bonanni	11857a00b0	[Attention] Add ROCM_AITER_MLA_SPARSE to attention backend registry (#29103 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-20 20:24:43 -08:00
Or Ozeri	647464719b	[KVConnector][Core] Support cross-layer KV blocks (#27743 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-11-20 19:09:59 +01:00
Pleaplusone	06c20c9904	[ROCm] Add AMD GPU support on Deepseek v3.2 and SparseMLA (#26670 ) Signed-off-by: ganyi <ygan@amd.com>	2025-11-20 02:54:01 -08:00
Qiang Zhang	3fb0d90999	[AMD] Use Decoupled Kernel Block Size to Support AITER MLA block_size=1 (#27715 ) Signed-off-by: chiangzhang <chiangzhang@tencent.com>	2025-11-20 02:11:52 +00:00
Qiu	2fd893b4ce	[Feature] Prefill Context Parallel (PCP) basic support (#28718 ) Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com> Signed-off-by: LookAround <lixushi@huawei.com> Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com> Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com> Co-authored-by: FENP <yuanyongjie.yyj@antgroup.com> Co-authored-by: LookAround <lixushi@huawei.com> Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com> Co-authored-by: zhenwenqi2024 <zhenwenqi_2022@qq.com> Co-authored-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>	2025-11-19 15:52:44 -05:00
Aleksandr Malyshev	ac10fd3c69	Upstreaming aiter triton attention backend as a new backend (#28701 ) Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>	2025-11-19 19:59:30 +00:00
Shanshan Shen	d44e9df7d4	[Model][Mamba] Add selector for mamba attention backend and make it pluggable for other device (#26487 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-11-19 16:24:55 +00:00
Song Zhixin	285eaa4285	[Bugfix] Safeguard against missing backend in AttentionBackendEnum (#28846 ) Signed-off-by: jesse <szxfml@gmail.com> Signed-off-by: Song Zhixin <szxfml@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-18 10:53:44 +00:00
Benjamin Chislett	bf3ffb61e6	[Bugfix] Fix ChunkedLocalAttention CUDA Graph setting (#28739 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-11-14 14:14:46 -08:00
rasmith	15ae8e0784	[Bugfix][CI/Test][Spec Decode] Fix illegal memory access in offline_inference/spec_decode.py (Issue 27619) (#28432 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2025-11-13 22:34:01 -08:00
Huamin Li	07a606aa7e	[CI Failure] Fix backend selection for encoder-only models (#28534 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-11-13 10:11:27 -05:00
Benjamin Chislett	304419576a	[Perf] Refactor cudagraph_support to enable full CUDA graphs for spec decoding with FlashInfer (#28479 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-11-13 01:56:40 +09:00
Nicolò Lucchesi	728a9eb70e	[Misc] Refactor Attention kv transfer methods into decorator (#27816 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>	2025-11-12 16:05:44 +00:00
wangxiyuan	10138c92a5	[V0 deprecation] Deprecate use_v1 parameter (#28112 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-12 14:03:52 +00:00
Andreas Karatzas	9f0247cfa4	`VLLM_USE_TRITON_FLASH_ATTN` V0 variable deprecation (#27611 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Andreas Karatzas <Andreas.Karatzas@amd.com>	2025-11-11 18:34:36 -08:00
Li, Jiang	7f829be7d3	[CPU] Refactor CPU attention backend (#27954 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-11-12 09:43:06 +08:00
Lukas Geiger	76e4dcf225	[Misc] Remove unused attention prefix prefill ops functions (#26971 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-11-11 18:26:04 +00:00
Matthew Bonanni	b30dfa03c5	[Attention] Refactor CUDA attention backend selection logic (#24794 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-11 07:40:44 -05:00
David Ben-David	cc079763c5	[BugFix] Avoid calling KV connector layer APIs when metadata is unset (#28253 ) Signed-off-by: David Ben-David <davidb@pliops.com> Co-authored-by: David Ben-David <davidb@pliops.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>	2025-11-10 23:39:36 -08:00
Lucas Wilkinson	39029d5192	[CI/Test Fix] Fix CP tests on Blackwell (#28404 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-11 01:36:29 +00:00
Adrian Abeyta	a5a790eea6	[Bugfix] Ensure calculated KV scales are applied in attention. (#27232 ) Signed-off-by: adabeyta <aabeyta@redhat.com>	2025-11-10 23:42:37 +00:00
vllmellm	f080a83511	[RFC][ROCm][AITER] Keep all AITER kernels in `_aiter_ops` class like `_custom_ops` and `_ipex_ops` (#24490 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-10 08:20:53 -08:00

1 2 3 4 5 ...

560 Commits