xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-18 06:27:00 +08:00

Author	SHA1	Message	Date
Lucas Kabela	94666612a9	[Misc][qwen2_5_vl][torch.compile] Enable `supports_torch_compile` on generic nn.Module and demonstrate speedup on Qwen Vision model (#23207 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com> Signed-off-by: Lucas Kabela <lucasakabela@gmail.com>	2025-10-28 22:36:43 +00:00
Yeshwanth N	71b1c8b667	[Chore]:Extract math and argparse utilities to separate modules (#27188 ) Signed-off-by: Yeshwanth Surya <yeshsurya@gmail.com> Signed-off-by: Yeshwanth N <yeshsurya@gmail.com> Signed-off-by: yeshsurya <yeshsurya@gmail.com>	2025-10-26 04:03:32 -07:00
JartX	65d2cf9511	[BUGFIX][ROCM] ViT FlashAttention on ROCm (no GFX9) and contiguous on qwen3vl ROCm TORCH_SDPA (#27190 ) Signed-off-by: JartX <sagformas@epdcenter.es> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-10-26 15:08:52 +08:00
Matthew Bonanni	a99564ac5b	[Attention] Add missing kv cache scale setup (#27490 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-25 00:12:49 -07:00
Bradley D	570c3e1cd4	[Bugfix] Honor --mm_encoder_attn_backend when used (#27124 ) Co-authored-by: Bradley D <4551889+bradleyhd@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-10-23 20:09:52 +08:00
Tao He	250fb1b8ea	[Bugfix] fixes the decoding metadata of dense mla's fp8 kvcache. (#27144 ) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-10-21 18:27:03 +00:00
Roger Wang	c3a2c6ac5f	[MM][Core] Decouple ViT backend from LM backend (#27061 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-10-21 00:30:10 -07:00
Isotr0py	6ac5e06f7c	[Chore] Clean up pytorch helper functions in `vllm.utils` (#26908 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: isotr0py <2037008807@qq.com>	2025-10-18 09:48:22 -07:00
Nicolò Lucchesi	b26b70bec4	[Misc] Refactor `get_kv_cache_spec` into `AttentionLayerBase` (#26587 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-10-18 13:51:21 +00:00
Zhuohan Li	d29483b58a	[Minor] Remove unnecessary error message (#27115 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>	2025-10-17 20:02:12 +00:00
Cyrus Leung	4d4d6bad19	[Chore] Separate out `vllm.utils.importlib` (#27022 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-17 00:48:59 +00:00
rongfu.leng	5afd3276df	[Feature] Add process_weights_after_loading to AttentionImpl (#26870 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-10-16 08:02:30 -07:00
Adrian Abeyta	0a9ef0cfce	Move query quantization to attention layer for Flashinfer & Triton. (#26534 ) Signed-off-by: adabeyta <aabeyta@redhat.com> Signed-off-by: Adrian Abeyta <aabeyta@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-15 19:01:38 -04:00
Mengqing Cao	302ef403a2	[DSA][MLA] Tiny refactor on DeepSeek to make it reusable for different backends (#26656 ) Signed-off-by: MengqingCao <cmq0113@163.com>	2025-10-15 00:16:44 -07:00
Luka Govedič	2dcd12d357	[torch.compile] Fix tests for torch==2.9 inductor partition (#26116 ) Signed-off-by: ProExpertProg <lgovedic@redhat.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com>	2025-10-14 19:55:02 -04:00
Boyuan Feng	a86b4c58e8	remove attn output view kernel (#26680 ) Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: Boyuan Feng <fby.1994@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-14 22:53:10 +00:00
Jaya Yuan	ea97940d6c	[DCP] Support Decode Context Parallel (DCP) for GQA with FlashAttention (#24864 ) Signed-off-by: yuanyongjie.yyj <yuanyongjie.yyj@antgroup.com> Signed-off-by: FENP <32334296+FENP@users.noreply.github.com> Signed-off-by: Jaya Yuan <yuanyongjie.yyj@antgroup.com>	2025-10-14 13:07:50 +00:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Huamin Li	0cd103e7cb	CP: make correct_attn_out robust to 4‑D views and fix Triton arg binding (#26509 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-10-11 20:50:57 +00:00
Wenzheng Bi	ec10fd0abc	[Bugfix] Move current_platform import to avoid python import cache. (#16601 ) Signed-off-by: iwzbi <wzbi@zju.edu.cn>	2025-10-09 10:46:19 +00:00
Zhiyuan Li	d24cf322e1	[Hybrid]: Decouple Kernel Block Size from KV Page Size (#24486 ) Signed-off-by: lizhiyuan <uniartisan2017@gmail.com> Signed-off-by: Zhiyuan Li <uniartisan2017@gmail.com>	2025-10-08 23:43:39 -07:00
Naveenraj Kamalakannan	e614ab7806	Separate MLAAttention class from Attention (#25103 ) Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-08 17:11:11 -07:00
Matthew Bonanni	2a03f93de9	[Attention] Register FLASHMLA_SPARSE (#26441 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-08 22:28:52 +00:00
Matthew Bonanni	76879cc160	[Attention] Implement universal BACKEND_MAP (#25900 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-08 12:00:25 -07:00
Lucas Wilkinson	f80e7866c0	[Misc] Clean up cruft from previous FlashMLA sparse implementation (#26125 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-10-08 10:09:34 +08:00
Gregory Shtrasberg	f231e5bc21	[ROCm] Split AITER unified attention into its own backend (#25507 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-10-06 22:49:23 +00:00
Harry Mellor	6c04638214	Fix per file ruff ignores related to line length (#26262 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-06 05:12:40 +00:00
Harry Mellor	b893d661b1	Fix per file ruff ignores related to simplification (#26259 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 20:31:53 +00:00
Harry Mellor	1c0c68202c	Fix per file ruff ignores related to typing (#26254 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 16:37:55 +00:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Cyrus Leung	4570535ec4	[Model] CLIP Embedding Support (#26010 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-04 06:21:42 -07:00
TJian	9c5ee91b2a	[ROCm] [VL] [Bugfix] Fix vit flash attn dispatcher logic for ROCm (#26104 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-10-02 22:34:53 -07:00
Matthew Bonanni	2aaa423842	[Attention] Move Backend enum into registry (#25893 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-02 20:32:24 -07:00
Lucas Wilkinson	4134312b35	[BugFix] ChunkedLocalAttention is currently not CG compatible (#26034 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-10-01 16:28:00 -07:00
youkaichao	a2e6fa7e03	[bugfix][deepseek] fix flashmla kernel selection (#25956 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-10-01 00:30:36 +08:00
Yongye Zhu	fa7e254a7f	[New Model] DeepSeek-V3.2 (Rebased to Main) (#25896 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Signed-off-by: Lucia Fang <fanglu@meta.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com> Co-authored-by: Lucia Fang <fanglu@meta.com> Co-authored-by: NickLucche <nlucches@redhat.com> Co-authored-by: Siyuan Fu <siyuanf@nvidia.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Xiaozhu Meng <mxz297@gmail.com> Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>	2025-09-30 17:14:41 +08:00
Harry Mellor	61aedb5ffe	Move`VllmConfig` from `config/__init__.py` to `config/vllm.py` (#25271 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-29 19:49:49 -07:00
Adrian Abeyta	c42ff4f4fd	[BugFix][torch.compile] KV scale calculation issues with FP8 quantization (#25513 ) Signed-off-by: adabeyta <aabeyta@redhat.com>	2025-09-29 15:52:04 -04:00
Juechen Liu	a3ae45a38c	[Misc] fix tests failure by using current_platform (#25825 ) Signed-off-by: Juechen Liu <jueliu@meta.com>	2025-09-29 04:18:57 +00:00
Matthew Bonanni	3468f17ebe	[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names (#25489 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>	2025-09-25 17:37:50 +00:00
Jonas M. Kübler	69a8c8e99a	[torch.compile] Make Query Quantization Fusable (#24914 ) Signed-off-by: Jonas Kuebler <kuebj@amazon.com>	2025-09-25 09:25:12 -04:00
Kunshang Ji	d2af67441d	[XPU][Triton]add xpu config in triton_reshape_and_cache_flash (#25643 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-09-25 12:38:11 +00:00
Wei Wei	05c19485a5	[Kernel] Support DCP for Triton backend (#25132 ) Signed-off-by: Wei Wei <wwei6@meta.com>	2025-09-24 18:09:34 -07:00
Woosuk Kwon	e6750d0b18	[V0 Deprecation] Remove unused classes in attention (#25541 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-09-24 13:24:40 -07:00
Harry Mellor	8c853050e7	[Docs] Enable `fail_on_warning` for the docs build in CI (#25580 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-24 19:30:33 +00:00
Woosuk Kwon	2e19a848d4	[V0 Deprecation] Remove max_seq_len_to_capture (#25543 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-24 01:51:39 -07:00
Michael Goin	7361ab379f	Remove redundant mutates_args and dispatch_key for direct_register_custom_op (#25512 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-09-23 22:48:40 +00:00
Michael Goin	4f2954f724	Fix triton_reshape_and_cache_flash.py triton import (#25522 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-09-23 15:26:10 -07:00
Thomas Parnell	969b4da3a6	[V0 Deprecation] Remove placeholder attn (#25510 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-09-23 22:12:14 +00:00
Burkhard Ringlein	100b630a60	[V1][Kernel] Add triton implementation for `reshape_and_cache_flash` (#24503 ) Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com> Co-authored-by: Chih-Chieh Yang <chih.chieh.yang@ibm.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-23 12:52:40 -04:00

1 2 3 4 5 ...

504 Commits