xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-04 23:37:21 +08:00

Author	SHA1	Message	Date
Huamin Li	07a606aa7e	[CI Failure] Fix backend selection for encoder-only models (#28534 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-11-13 10:11:27 -05:00
Benjamin Chislett	304419576a	[Perf] Refactor cudagraph_support to enable full CUDA graphs for spec decoding with FlashInfer (#28479 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-11-13 01:56:40 +09:00
Nicolò Lucchesi	728a9eb70e	[Misc] Refactor Attention kv transfer methods into decorator (#27816 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>	2025-11-12 16:05:44 +00:00
wangxiyuan	10138c92a5	[V0 deprecation] Deprecate use_v1 parameter (#28112 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-12 14:03:52 +00:00
Andreas Karatzas	9f0247cfa4	`VLLM_USE_TRITON_FLASH_ATTN` V0 variable deprecation (#27611 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com> Signed-off-by: Andreas Karatzas <Andreas.Karatzas@amd.com>	2025-11-11 18:34:36 -08:00
Li, Jiang	7f829be7d3	[CPU] Refactor CPU attention backend (#27954 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-11-12 09:43:06 +08:00
Lukas Geiger	76e4dcf225	[Misc] Remove unused attention prefix prefill ops functions (#26971 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-11-11 18:26:04 +00:00
Matthew Bonanni	b30dfa03c5	[Attention] Refactor CUDA attention backend selection logic (#24794 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-11 07:40:44 -05:00
David Ben-David	cc079763c5	[BugFix] Avoid calling KV connector layer APIs when metadata is unset (#28253 ) Signed-off-by: David Ben-David <davidb@pliops.com> Co-authored-by: David Ben-David <davidb@pliops.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>	2025-11-10 23:39:36 -08:00
Lucas Wilkinson	39029d5192	[CI/Test Fix] Fix CP tests on Blackwell (#28404 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-11 01:36:29 +00:00
Adrian Abeyta	a5a790eea6	[Bugfix] Ensure calculated KV scales are applied in attention. (#27232 ) Signed-off-by: adabeyta <aabeyta@redhat.com>	2025-11-10 23:42:37 +00:00
vllmellm	f080a83511	[RFC][ROCm][AITER] Keep all AITER kernels in `_aiter_ops` class like `_custom_ops` and `_ipex_ops` (#24490 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-10 08:20:53 -08:00
Lucas Wilkinson	e8697faf03	[V0 deprecation] Remove no longer used `get_metadata_cls` (#28370 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-10 14:32:09 +08:00
zhangsicheng5	2108a571d7	[DCP] Support dcp kv_cache interleave size > 1 (#26696 ) Signed-off-by: zhangsicheng5 <zhangsicheng5@huawei.com> Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Signed-off-by: Qiu <qiuchunshuo@huawei.com> Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com>	2025-11-09 04:45:27 +09:00
wangxiyuan	428bc7bf1c	[V0 deprecation] Remove VLLM_USE_V1 usage in most modules (#27955 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-04 20:51:16 -08:00
Kunshang Ji	18b39828d9	[XPU] Add gpt-oss model support for Intel GPU (#27786 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-11-05 02:17:23 +00:00
Lucas Kabela	55011aef24	[Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op and reenable compile (#27764 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2025-11-03 11:12:15 -08:00
Yan Ma	7e2729b57e	[Multimodal][XPU]Enable vision attn backend for xpu platform (#27525 ) Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Yejing Lai <yejing.lai@intel.com> Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2025-11-01 04:45:02 +00:00
Lucas Kabela	94666612a9	[Misc][qwen2_5_vl][torch.compile] Enable `supports_torch_compile` on generic nn.Module and demonstrate speedup on Qwen Vision model (#23207 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com> Signed-off-by: Lucas Kabela <lucasakabela@gmail.com>	2025-10-28 22:36:43 +00:00
Yeshwanth N	71b1c8b667	[Chore]:Extract math and argparse utilities to separate modules (#27188 ) Signed-off-by: Yeshwanth Surya <yeshsurya@gmail.com> Signed-off-by: Yeshwanth N <yeshsurya@gmail.com> Signed-off-by: yeshsurya <yeshsurya@gmail.com>	2025-10-26 04:03:32 -07:00
JartX	65d2cf9511	[BUGFIX][ROCM] ViT FlashAttention on ROCm (no GFX9) and contiguous on qwen3vl ROCm TORCH_SDPA (#27190 ) Signed-off-by: JartX <sagformas@epdcenter.es> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-10-26 15:08:52 +08:00
Matthew Bonanni	a99564ac5b	[Attention] Add missing kv cache scale setup (#27490 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-25 00:12:49 -07:00
Bradley D	570c3e1cd4	[Bugfix] Honor --mm_encoder_attn_backend when used (#27124 ) Co-authored-by: Bradley D <4551889+bradleyhd@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-10-23 20:09:52 +08:00
Tao He	250fb1b8ea	[Bugfix] fixes the decoding metadata of dense mla's fp8 kvcache. (#27144 ) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-10-21 18:27:03 +00:00
Roger Wang	c3a2c6ac5f	[MM][Core] Decouple ViT backend from LM backend (#27061 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-10-21 00:30:10 -07:00
Isotr0py	6ac5e06f7c	[Chore] Clean up pytorch helper functions in `vllm.utils` (#26908 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: isotr0py <2037008807@qq.com>	2025-10-18 09:48:22 -07:00
Nicolò Lucchesi	b26b70bec4	[Misc] Refactor `get_kv_cache_spec` into `AttentionLayerBase` (#26587 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-10-18 13:51:21 +00:00
Zhuohan Li	d29483b58a	[Minor] Remove unnecessary error message (#27115 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>	2025-10-17 20:02:12 +00:00
Cyrus Leung	4d4d6bad19	[Chore] Separate out `vllm.utils.importlib` (#27022 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-17 00:48:59 +00:00
rongfu.leng	5afd3276df	[Feature] Add process_weights_after_loading to AttentionImpl (#26870 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-10-16 08:02:30 -07:00
Adrian Abeyta	0a9ef0cfce	Move query quantization to attention layer for Flashinfer & Triton. (#26534 ) Signed-off-by: adabeyta <aabeyta@redhat.com> Signed-off-by: Adrian Abeyta <aabeyta@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-15 19:01:38 -04:00
Mengqing Cao	302ef403a2	[DSA][MLA] Tiny refactor on DeepSeek to make it reusable for different backends (#26656 ) Signed-off-by: MengqingCao <cmq0113@163.com>	2025-10-15 00:16:44 -07:00
Luka Govedič	2dcd12d357	[torch.compile] Fix tests for torch==2.9 inductor partition (#26116 ) Signed-off-by: ProExpertProg <lgovedic@redhat.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com>	2025-10-14 19:55:02 -04:00
Boyuan Feng	a86b4c58e8	remove attn output view kernel (#26680 ) Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: Boyuan Feng <fby.1994@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-14 22:53:10 +00:00
Jaya Yuan	ea97940d6c	[DCP] Support Decode Context Parallel (DCP) for GQA with FlashAttention (#24864 ) Signed-off-by: yuanyongjie.yyj <yuanyongjie.yyj@antgroup.com> Signed-off-by: FENP <32334296+FENP@users.noreply.github.com> Signed-off-by: Jaya Yuan <yuanyongjie.yyj@antgroup.com>	2025-10-14 13:07:50 +00:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Huamin Li	0cd103e7cb	CP: make correct_attn_out robust to 4‑D views and fix Triton arg binding (#26509 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-10-11 20:50:57 +00:00
Wenzheng Bi	ec10fd0abc	[Bugfix] Move current_platform import to avoid python import cache. (#16601 ) Signed-off-by: iwzbi <wzbi@zju.edu.cn>	2025-10-09 10:46:19 +00:00
Zhiyuan Li	d24cf322e1	[Hybrid]: Decouple Kernel Block Size from KV Page Size (#24486 ) Signed-off-by: lizhiyuan <uniartisan2017@gmail.com> Signed-off-by: Zhiyuan Li <uniartisan2017@gmail.com>	2025-10-08 23:43:39 -07:00
Naveenraj Kamalakannan	e614ab7806	Separate MLAAttention class from Attention (#25103 ) Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-08 17:11:11 -07:00
Matthew Bonanni	2a03f93de9	[Attention] Register FLASHMLA_SPARSE (#26441 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-08 22:28:52 +00:00
Matthew Bonanni	76879cc160	[Attention] Implement universal BACKEND_MAP (#25900 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-08 12:00:25 -07:00
Lucas Wilkinson	f80e7866c0	[Misc] Clean up cruft from previous FlashMLA sparse implementation (#26125 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-10-08 10:09:34 +08:00
Gregory Shtrasberg	f231e5bc21	[ROCm] Split AITER unified attention into its own backend (#25507 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-10-06 22:49:23 +00:00
Harry Mellor	6c04638214	Fix per file ruff ignores related to line length (#26262 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-06 05:12:40 +00:00
Harry Mellor	b893d661b1	Fix per file ruff ignores related to simplification (#26259 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 20:31:53 +00:00
Harry Mellor	1c0c68202c	Fix per file ruff ignores related to typing (#26254 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 16:37:55 +00:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Cyrus Leung	4570535ec4	[Model] CLIP Embedding Support (#26010 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-04 06:21:42 -07:00
TJian	9c5ee91b2a	[ROCm] [VL] [Bugfix] Fix vit flash attn dispatcher logic for ROCm (#26104 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-10-02 22:34:53 -07:00

1 2 3 4 5 ...

522 Commits