xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-15 07:33:30 +08:00

Author	SHA1	Message	Date
Yongye Zhu	fa7e254a7f	[New Model] DeepSeek-V3.2 (Rebased to Main) (#25896 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Signed-off-by: Lucia Fang <fanglu@meta.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com> Co-authored-by: Lucia Fang <fanglu@meta.com> Co-authored-by: NickLucche <nlucches@redhat.com> Co-authored-by: Siyuan Fu <siyuanf@nvidia.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Xiaozhu Meng <mxz297@gmail.com> Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>	2025-09-30 17:14:41 +08:00
Harry Mellor	61aedb5ffe	Move`VllmConfig` from `config/__init__.py` to `config/vllm.py` (#25271 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-29 19:49:49 -07:00
Adrian Abeyta	c42ff4f4fd	[BugFix][torch.compile] KV scale calculation issues with FP8 quantization (#25513 ) Signed-off-by: adabeyta <aabeyta@redhat.com>	2025-09-29 15:52:04 -04:00
Juechen Liu	a3ae45a38c	[Misc] fix tests failure by using current_platform (#25825 ) Signed-off-by: Juechen Liu <jueliu@meta.com>	2025-09-29 04:18:57 +00:00
Matthew Bonanni	3468f17ebe	[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names (#25489 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>	2025-09-25 17:37:50 +00:00
Jonas M. Kübler	69a8c8e99a	[torch.compile] Make Query Quantization Fusable (#24914 ) Signed-off-by: Jonas Kuebler <kuebj@amazon.com>	2025-09-25 09:25:12 -04:00
Kunshang Ji	d2af67441d	[XPU][Triton]add xpu config in triton_reshape_and_cache_flash (#25643 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-09-25 12:38:11 +00:00
Wei Wei	05c19485a5	[Kernel] Support DCP for Triton backend (#25132 ) Signed-off-by: Wei Wei <wwei6@meta.com>	2025-09-24 18:09:34 -07:00
Woosuk Kwon	e6750d0b18	[V0 Deprecation] Remove unused classes in attention (#25541 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-09-24 13:24:40 -07:00
Harry Mellor	8c853050e7	[Docs] Enable `fail_on_warning` for the docs build in CI (#25580 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-24 19:30:33 +00:00
Woosuk Kwon	2e19a848d4	[V0 Deprecation] Remove max_seq_len_to_capture (#25543 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-24 01:51:39 -07:00
Michael Goin	7361ab379f	Remove redundant mutates_args and dispatch_key for direct_register_custom_op (#25512 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-09-23 22:48:40 +00:00
Michael Goin	4f2954f724	Fix triton_reshape_and_cache_flash.py triton import (#25522 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-09-23 15:26:10 -07:00
Thomas Parnell	969b4da3a6	[V0 Deprecation] Remove placeholder attn (#25510 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-09-23 22:12:14 +00:00
Burkhard Ringlein	100b630a60	[V1][Kernel] Add triton implementation for `reshape_and_cache_flash` (#24503 ) Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com> Co-authored-by: Chih-Chieh Yang <chih.chieh.yang@ibm.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-23 12:52:40 -04:00
Michael Goin	78237e43bf	[Bugfix] Remove contiguous output req for context parallel MLA (#25414 ) Signed-off-by: Michael Goin <mgoin64@gmail.com>	2025-09-22 20:26:32 -07:00
Cyrus Leung	417a164af6	[Misc] Remove unused encoder-decoder error strings (#25374 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-22 11:04:32 +00:00
Cyrus Leung	f92d952632	[V0 Deprecation] Remove `MultiModalPlaceholderMap` (#25366 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-22 08:49:19 +00:00
Woosuk Kwon	bc6e542d9f	Remove V0 attention backends (#25351 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-21 16:03:28 -07:00
Woosuk Kwon	1cd885bd54	[V0 Deprecation] Remove V0 model runner base & simplify worker base (#25328 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-20 20:49:09 -07:00
Woosuk Kwon	c99db8c8dd	[V0 Deprecation] Remove V0 core (#25321 ) Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-20 19:58:26 -07:00
Chendi.Xue	6c5f82e5aa	[BUG FIX][NON-CUDA]quick fix to avoid call cudagraph_unsafe in attention (#25298 ) Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>	2025-09-20 04:41:23 +00:00
Boyuan Feng	8945b001db	[torch.compile] CUDAGraph Inductor partition integration (#24281 ) Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: Boyuan Feng <fby.1994@gmail.com> Signed-off-by: boyuanfeng <boyuan@meta.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-20 01:02:15 +00:00
qizixi	a2a5f79e09	Optimize triton unified attention performance for sliding window attention (#24390 ) Signed-off-by: zixi-qi <qizixi@meta.com>	2025-09-19 13:07:26 -06:00
Harry Mellor	12aed7e453	Encoder model support for the Transformers backend (#25174 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-09-19 19:15:22 +01:00
Yan Ma	a684c0124c	[bugfix] fix MHA for models like OpenGVLab/InternVL3_5-38B (#25146 ) Signed-off-by: Yan Ma <yan.ma@intel.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-19 08:45:06 +00:00
jvlunteren	01a583fea4	[Kernel] Decouple Tile Size from Block Size in Triton Unified Attention Kernel (#21197 ) Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>	2025-09-18 14:27:01 +00:00
Chaojun Zhang	3bc18127ff	[XPU] Whisper model support on XPU Platform (#25123 ) Signed-off-by: chzhang <chaojun.zhang@intel.com>	2025-09-18 04:30:10 +00:00
Douglas Lehr	1a456c7c90	Aiter mha fp8 fix (#24991 ) Signed-off-by: Doug Lehr <douglehr@amd.com> Co-authored-by: Doug Lehr <douglehr@amd.com>	2025-09-17 22:29:14 +00:00
Sugar	cd1f885bcf	Directly get max encoder len from VLLM config in V1 (#24866 ) Signed-off-by: Sugar-zsg <952242923@qq.com>	2025-09-16 17:52:31 +00:00
Wentao Ye	b42566f440	[Bug] Fix `is_flashmla_supported` Check Error (#24774 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-09-15 20:10:55 -06:00
Rafael Marcelino Koike	b834b4cbf1	[USAGE] Improve error handling for weight initialization in Unquantized… (#20321 ) Signed-off-by: Rafael Marcelino Koike <rafael.koike@oracle.com> Signed-off-by: Rafael Koike <koike.rafael@gmail.com>	2025-09-15 16:45:49 +00:00
Matthew Bonanni	7ba32aa60b	[Attention][FlashInfer] Enable FP8 FlashInfer (TRTLLM) MLA decode (#24705 ) Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>	2025-09-12 15:45:53 -06:00
Didier Durand	bcb06d7baf	[Doc]: fix typos in various files (#24726 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-09-12 06:43:12 -07:00
Wenlong Wang	72fc8aa412	[Multi Modal] Add FA3 in VIT (#24347 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>	2025-09-12 21:27:24 +08:00
Cyrus Leung	6aeb1dab4a	[Bugfix] Fix incorrect import of CacheConfig (#24631 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-11 01:48:25 -07:00
TaehyunKim	9bd831f501	[Model] New model support for Motif-1-Tiny (#23414 ) Signed-off-by: ca1207 <ca1207zzz@gmail.com> Signed-off-by: TaehyunKim <73943231+ca1207@users.noreply.github.com> Co-authored-by: WyldeCat <skan1543@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-10 23:29:40 -07:00
Li, Jiang	29799ddacc	[Bugfix] Add missing VIT backend dispatch on CPU (#24623 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-09-10 22:28:41 -07:00
Gregory Shtrasberg	9a161307f5	[torch.compile][ROCm][V1] Enable attention output FP8 fusion for V1 attention backends (#19767 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-10 13:59:55 -07:00
Russell Bryant	37e8182bfe	[v1] Add Whisper model support (encoder-decoder) (#21088 ) Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: NickLucche <nlucches@redhat.com>	2025-09-10 13:53:35 -07:00
baonudesifeizhai	6cbd41909e	Feature/vit attention unification# 23880 (#23978 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-10 06:10:14 -07:00
Wentao Ye	15de5ff9ea	[Feature] Disallow FlashMLA on Blackwell (#24521 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-09 14:59:34 -04:00
Didier Durand	f4962a6d55	[Doc]: fix typos in Python comments (#24417 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-09-08 00:22:16 -07:00
youkaichao	558f0907dc	[attention][DCP] use AttentionImpl.need_to_return_lse_for_decode (#24372 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-09-07 01:18:59 +00:00
Woosuk Kwon	4172235ab7	[V0 deprecation] Deprecate V0 Neuron backend (#21159 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-06 16:15:18 -07:00
yzds	ac201a0eaf	[Feature] Support Decode Context Parallel (DCP) for MLA (#23734 ) Signed-off-by: hongchao <hongchao@msh.team> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: hongchao <hongchao@msh.team> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-09-06 13:24:05 +08:00
Didier Durand	83609ca91d	[Doc]: fix typos in Python comments (#24173 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-04 08:52:17 -07:00
Kunshang Ji	16ded21eeb	[XPU] support Triton Attention backend on Intel GPU (#24149 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-09-04 20:41:08 +08:00
Lucas Wilkinson	402759d472	[Attention] FlashAttn MLA (#14258 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2025-09-04 02:47:59 -07:00
Burkhard Ringlein	6d80ae83e1	[Bugfix] Fixing division by zero in triton_attn if query_heads/kv_heads > 16 (#23424 ) Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com>	2025-09-03 15:01:09 +00:00

1 2 3 4 5 ...

469 Commits