xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-01 05:07:04 +08:00

Author	SHA1	Message	Date
Chen Zhang	6f0f570c43	[deepseek] kernel block size for UniformTypeKVCacheSpecs (#26559 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-10-10 16:40:41 +08:00
Benjamin Chislett	6e783bc54b	[Bugfix] Fix CUDA graph selection bug in FlashInfer at high concurrency (#26499 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-10-09 17:12:34 -04:00
Ming Yang	3b736e1c38	[Attention][DCP] Support DCP with query length > 1 (MTP) with FA3 (#25049 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-10-09 08:06:29 -07:00
elvischenv	5e49c3e777	Bump Flashinfer to v0.4.0 (#26326 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-10-08 23:58:44 -07:00
Zhiyuan Li	d24cf322e1	[Hybrid]: Decouple Kernel Block Size from KV Page Size (#24486 ) Signed-off-by: lizhiyuan <uniartisan2017@gmail.com> Signed-off-by: Zhiyuan Li <uniartisan2017@gmail.com>	2025-10-08 23:43:39 -07:00
Naveenraj Kamalakannan	e614ab7806	Separate MLAAttention class from Attention (#25103 ) Signed-off-by: Naveenraj Kamalakannan <therealnaveenkamal@gmail.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-08 17:11:11 -07:00
Matthew Bonanni	2a03f93de9	[Attention] Register FLASHMLA_SPARSE (#26441 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-08 22:28:52 +00:00
elvischenv	b82f4307c9	[Bugfix][Flashinfer] fix VLLM_USE_TRTLLM_ATTENTION issue for models with diff hyperparameters (#25924 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-10-08 19:54:48 +00:00
Utkarsh Sharma	335b28f7d1	[TPU] Rename tpu_commons to tpu_inference (#26279 ) Signed-off-by: Utkarsh Sharma <utksharma@google.com> Co-authored-by: Utkarsh Sharma <utksharma@google.com> Co-authored-by: Chengji Yao <chengjiyao@google.com>	2025-10-07 23:30:52 -07:00
Lucas Wilkinson	f80e7866c0	[Misc] Clean up cruft from previous FlashMLA sparse implementation (#26125 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-10-08 10:09:34 +08:00
Benjamin Chislett	3d1f67616d	[Spec Decode] Enable efficient speculative decoding with FlashInfer-MLA (#25984 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-10-07 16:05:59 -04:00
Pei-Lun Liao	eb577e4655	[Bugfix] Add missing sink tensor into flash attn cascade attn implementation (#26325 )	2025-10-07 18:56:39 +00:00
Benjamin Chislett	f77df94647	[Perf] Add decode full-graph support to FlashInfer-MLA backend (#26313 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-10-06 23:03:49 +00:00
Gregory Shtrasberg	f231e5bc21	[ROCm] Split AITER unified attention into its own backend (#25507 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-10-06 22:49:23 +00:00
Matthew Bonanni	4727a8afa7	[Attention] Remove unused reorder_batch method (#24463 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-06 13:13:39 -04:00
Roger Wang	43c146ca42	[Misc] Clean up unnecessary E501 ignore (#26274 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-10-06 07:29:18 +00:00
Harry Mellor	6c04638214	Fix per file ruff ignores related to line length (#26262 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-06 05:12:40 +00:00
Thomas Parnell	778f554157	[V1] [Hybrid] Some additional clean-up in Mamba2 prefix caching (#26222 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-10-06 10:40:30 +08:00
Harry Mellor	1c0c68202c	Fix per file ruff ignores related to typing (#26254 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 16:37:55 +00:00
Harry Mellor	4e256cadc2	Remove all references to `yapf` as it's no longer used (#26251 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 09:18:11 -07:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Li, Jiang	5c057e068f	[CPU] Refine batch reorder of CPU attention backend (#26096 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-10-04 21:54:35 +08:00
Cyrus Leung	1838cd4860	Revert "Add batch invariant kernel override for FlashInfer backend [2/n]" (#26220 )	2025-10-04 02:45:08 -07:00
Stan Wozniak	ea507c3a93	[V1] [Hybrid] Mamba2 Automatic Prefix Caching (#25752 ) Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com> Signed-off-by: Thomas Ortner <boh@zurich.ibm.com> Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Thomas Ortner <boh@zurich.ibm.com> Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-10-04 06:34:22 +02:00
Bram Wasti	2f7dbc9b42	Add batch invariant kernel override for FlashInfer backend [2/n] (#25769 ) Signed-off-by: Bram Wasti <bwasti@meta.com> Signed-off-by: Bram Wasti <bwasti@fb.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-10-03 19:49:30 -07:00
Paul Pak	5f42fc53b6	[backends][short_conv] CUDA graph piecewise edits (#24215 ) Signed-off-by: Paul Pak <paulpak58@gmail.com>	2025-10-03 12:59:48 +00:00
Sage Moore	5f2cacdb1e	Quick fix for IMA with the Prefix Prefill kernel during graph capture (#25983 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-10-03 11:28:22 +00:00
Michael Goin	f1fc2107a3	[Bugfix] Disable cascade attention with FlashInfer (#26130 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-02 16:30:37 -07:00
Chen Zhang	1e50f1be70	[Deepseek v3.2] Support indexer prefill chunking (#25999 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-10-02 10:29:12 -07:00
Lucas Wilkinson	decf7f794b	[BugFix] Fix FI accuracy issue when used for MLA prefill (#26063 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-10-02 17:18:13 +00:00
Gregory Shtrasberg	0b018d8baf	[ROCm][Bugfix] Add missing parameter to ROCm backend (#26029 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-10-01 19:23:14 -07:00
Huamin Li	c36f0aa300	Fix test_mamba_ssm_ssd.py due to missing _query_start_loc_to_chunk_indices_offsets (#25995 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-10-01 18:18:36 +00:00
Lucia Fang	001e50c92c	[Model] MTP fallback to eager for DeepSeek v32 (#25982 ) Signed-off-by: Lu Fang <fanglu@fb.com>	2025-10-01 01:53:22 +00:00
Yongye Zhu	fa7e254a7f	[New Model] DeepSeek-V3.2 (Rebased to Main) (#25896 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Signed-off-by: Lucia Fang <fanglu@meta.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com> Co-authored-by: Lucia Fang <fanglu@meta.com> Co-authored-by: NickLucche <nlucches@redhat.com> Co-authored-by: Siyuan Fu <siyuanf@nvidia.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Xiaozhu Meng <mxz297@gmail.com> Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>	2025-09-30 17:14:41 +08:00
Thomas Parnell	fea3e476aa	[Kernel] Chunk-aligned mamba2 (#24683 )	2025-09-29 23:18:25 +02:00
Bram Wasti	dc48ba0c75	Kernel-override Determinism [1/n] (#25603 ) Signed-off-by: Bram Wasti <bwasti@meta.com>	2025-09-26 16:59:09 -07:00
Chih-Chieh Yang	2b6b1d7809	[Model] Mamba2 varlen refactor (#21467 ) Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com> Co-authored-by: RishiAstra <40644327+RishiAstra@users.noreply.github.com>	2025-09-26 11:31:14 +00:00
Icey	dd70437a4f	Remove cuda hard-code in compute_causal_conv1d_metadata (#25555 ) Signed-off-by: Icey <1790571317@qq.com>	2025-09-26 01:19:20 -07:00
Tao He	99b3a504c5	[Qwen3-Next][GDN] fixes cuda graph capturing bug in GDN metadata and a stride bug in causal_conv_1d. (#25743 ) Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>	2025-09-26 01:18:58 -07:00
Matthew Bonanni	3468f17ebe	[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names (#25489 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>	2025-09-25 17:37:50 +00:00
Li, Jiang	eb32335e35	[CPU] update torch 2.8 and fix missing fields in TorchSDPAMetadata (#25652 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-09-25 13:29:11 +00:00
Jonas M. Kübler	69a8c8e99a	[torch.compile] Make Query Quantization Fusable (#24914 ) Signed-off-by: Jonas Kuebler <kuebj@amazon.com>	2025-09-25 09:25:12 -04:00
Wei Wei	05c19485a5	[Kernel] Support DCP for Triton backend (#25132 ) Signed-off-by: Wei Wei <wwei6@meta.com>	2025-09-24 18:09:34 -07:00
Woosuk Kwon	e6750d0b18	[V0 Deprecation] Remove unused classes in attention (#25541 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-09-24 13:24:40 -07:00
Lucas Wilkinson	2338daffd3	[BugFix] Potential Fix for FA3 full-cudagraph IMA (#25490 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-09-24 02:04:04 -07:00
Benjamin Chislett	c30b405b8f	[Spec Decode] Enable FlashInfer Spec Decoding (#25196 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Co-authored-by: lhsjohn <huashuoli@tencent.com>	2025-09-23 22:29:58 -04:00
Lucas Wilkinson	9df8da548e	[BugFix] Fix MLA assert with CUTLASS MLA (#25478 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-09-23 21:09:43 -04:00
Benjamin Chislett	1983609239	[Bugfix] Use a separate FlashInfer workspace buffer for trtllm-gen (#25520 )	2025-09-24 00:19:56 +00:00
Burkhard Ringlein	100b630a60	[V1][Kernel] Add triton implementation for `reshape_and_cache_flash` (#24503 ) Signed-off-by: Burkhard Ringlein <ngl@zurich.ibm.com> Co-authored-by: Chih-Chieh Yang <chih.chieh.yang@ibm.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-09-23 12:52:40 -04:00
Lucas Wilkinson	cc1dc7ed6d	[Core/DBO][2/N] Dual-Batch Overlap add DeepEP High Throughput support and Prefill support (#24845 ) Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-09-23 16:02:10 +00:00

1 2 3 4 5 ...

330 Commits