xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-01 01:07:13 +08:00

Author	SHA1	Message	Date
Wentao Ye	879f69bed3	[Refactor] Remove duplicate `ceil_div` (#20023 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-06-25 05:19:09 +00:00
Ning Xie	71baf85ae1	[Kernel] mark TorchSDPABackend swap_blocks NotImplementedError (#19749 )	2025-06-20 18:18:11 +00:00
Ning Xie	71d1219545	[Kernel] correct cpu worker function parameter type (#19745 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-06-20 10:50:13 +00:00
Woosuk Kwon	f04d604567	[Minor] Zero-initialize attn output buffer (#19784 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-06-18 06:59:27 +00:00
Ning Xie	c53711bd63	[MISC] correct copy_blocks src_to_dists param type (#19696 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-06-17 17:21:06 -07:00
Nicolò Lucchesi	4c8f64faa7	[V1][Kernel] Flashinfer HND KV cache layout (#19280 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-06-17 09:09:22 -04:00
jvlunteren	ccd7c05089	[Kernel] Add Split-KV Support to Unified Triton Attention Kernel (#19152 ) Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>	2025-06-17 10:45:07 +00:00
22quinn	0b73736a0d	[Kernel] Raise verbose error and consolidate `num_heads/num_kv_heads` divisibility check (#19339 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-15 13:43:48 +08:00
Luka Govedič	f98548b9da	[torch.compile][ROCm] Fuse quantization onto attention using a torch.compile pass (#16756 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Co-authored-by: Sage Moore <sage@neuralmagic.com>	2025-06-12 08:31:04 -07:00
Ning Xie	2f1c19b245	[CI] change spell checker from codespell to typos (#18711 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-06-11 19:57:10 -07:00
rasmith	c7ea0b56cd	[AMD] [Quantization] Add override flag for attention dtype instead of using kv_cache_dtype trigger (#17331 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2025-06-11 15:53:28 -04:00
Jee Jee Li	04a55612dd	[Misc] Fix misleading ROCm warning (#19486 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-06-12 00:12:10 +08:00
Li, Jiang	4555143ea7	[CPU] V1 support for the CPU backend (#16441 )	2025-06-03 18:43:01 -07:00
Yong Hoon Shin	bdf13965ab	[V1] Support cross-layer KV sharing (#18212 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-06-03 20:33:07 +00:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
vllmellm	77b6e74fe2	[ROCm] Remove unnecessary assertion of max_model_len in ROCM_AITER_MLA attention backend. (#18938 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-05-29 22:33:17 -07:00
Gregory Shtrasberg	1b7cfd5a36	[ROCm][V0][Attention] Revert to the previous FA triton kernel (#18226 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-05-29 12:13:18 -04:00
Gregory Shtrasberg	da4b69d0b4	[Attention][V1] Toggle for v1 attention backend (#18275 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-05-29 10:48:24 -04:00
Hongxia Yang	269d901734	[Bugfix][ROCm] fix the power of 2 exception from triton_unified_attention.py when running llama4 models and unit test fix (#18100 ) Signed-off-by: Hongxia Yang <hongxia.yang@amd.com> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-05-29 07:21:46 +08:00
Lucas Wilkinson	ce75efeecb	[BugFix] FA2 MLA Accuracy Issue (#18807 ) Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com>	2025-05-28 08:59:39 +00:00
Hosang	dd5fa7e04f	[ROCm][Kernel][V1] Enable AMD Radeon GPU Custom Paged Attention on v1 (#17004 ) Signed-off-by: Hosang Yoon <hosang.yoon@amd.com>	2025-05-21 08:35:00 -07:00
Percy	980a172474	[Kernel] update comment for KV shape in unified triton attn (#18099 ) Signed-off-by: haochengxia <xhc_1007@163.com>	2025-05-20 11:19:34 -07:00
Thomas Parnell	01c22335ba	[Kernel] [V1] Fix performance regression for triton unified attention (#18161 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-15 06:39:00 -07:00
Thomas Parnell	e6b8e65d2d	[Bugfix] Fix fp8 tests for triton_unified_attention for Triton 3.3 (#18013 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-15 13:26:34 +08:00
qli88	4f8b373225	[BugFix][AMD] Compatible patch for AITER lib after 04/20 (#17912 ) Signed-off-by: Qiang Li <qiang.li2@amd.com>	2025-05-13 23:05:20 -07:00
Luka Govedič	176a95c670	[Fix] Support CUDAGraph capture for encoder-decoder on ROCm (#18104 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com>	2025-05-13 19:31:42 -07:00
Tao He	60f7624334	Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support (#11844 )	2025-05-12 19:52:47 -07:00
Gregory Shtrasberg	06c0922a69	[FP8][ROCm][Attention] Enable FP8 KV cache on ROCm for V1 (#17870 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-05-11 15:58:45 +08:00
Shiyan Deng	eea22a56ab	fix amd triton mla path (#17871 )	2025-05-11 07:53:31 +00:00
Michael Goin	85b72cb7b1	Revert "[BugFix][AMD] Compatible patch for latest AITER(05/07/2025)" (#17910 )	2025-05-09 08:58:18 -07:00
qli88	9f64e93415	[BugFix][AMD] Compatible patch for latest AITER(05/07/2025) (#17864 ) Signed-off-by: Qiang Li <qiang.li2@amd.com>	2025-05-09 08:59:36 -06:00
Lucas Wilkinson	5e6f939484	[Attention] MLA move rotary embedding to cuda-graph region (#17668 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-09 11:14:42 +08:00
vllmellm	3c9396a64f	[FEAT][ROCm]: Support AITER MLA on V1 Engine (#17523 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: qli88 <qiang.li2@amd.com> Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>	2025-05-09 10:42:05 +08:00
Agata Dobrzyniewicz	843b222723	[Hardware][Intel-Gaudi] Support Automatic Prefix Caching on HPU (#17648 ) Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>	2025-05-07 22:37:03 -07:00
Michael Goin	e50a1f1a9c	[TPU] Add kernel test for moe_pallas (#17496 ) Signed-off-by: Michael Goin <mgoin64@gmail.com>	2025-05-06 17:59:57 -07:00
Thomas Parnell	2f925e5777	[Kernel] Unified Triton kernel that doesn't distinguish between prefill + decode (#16828 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-06 18:21:48 -04:00
Chen Zhang	cba31c47c4	[v1] AttentionMetadata for each layer (#17394 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-05-06 07:58:37 -07:00
Mengqing Cao	f9bc5a0693	[Bugfix] Fix triton import with local TritonPlaceholder (#17446 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-05-06 17:53:09 +08:00
Harry Mellor	d6484ef3c3	Add full API docs and improve the UX of navigating them (#17485 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-03 19:42:43 -07:00
Hui Liu	4c33d67321	[Bugfix] fix tmp_out and exp_sums dimensions (#17438 ) Signed-off-by: Hui Liu <96135754+hliuca@users.noreply.github.com>	2025-05-02 16:44:07 +00:00
Andrew Sansom	cc2a77d7f1	[Core] [Bugfix] Add Input Embeddings (#15428 ) Signed-off-by: Andrew Sansom <andrew@protopia.ai> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: 临景 <linjing.yx@alibaba-inc.com> Co-authored-by: Bryce1010 <bryceyx@gmail.com> Co-authored-by: Nan2018 <nan@protopia.ai> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-02 01:06:39 -07:00
Lucas Wilkinson	afcb3f8863	[Attention] MLA move o_proj q_proj into cuda-graph region (#17484 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-02 03:16:26 +00:00
Hongxia Yang	28566d73b3	[ROCm] remove unsupported archs from rocm triton flash-attention supported list (#17536 ) Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>	2025-05-01 07:54:25 -07:00
Lucas Wilkinson	3c3d767201	[BugFix] Fix mla cpu - missing 3 required positional arguments (#17494 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-01 14:36:52 +08:00
Kunshang Ji	ed6cfb90c8	[Hardware][Intel GPU] Upgrade to torch 2.7 (#17444 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Co-authored-by: Qiming Zhang <qiming1.zhang@intel.com>	2025-04-30 00:03:58 -07:00
Huy Do	2c4f59afc3	Update PyTorch to 2.7.0 (#16859 )	2025-04-29 19:08:04 -07:00
Lucas Wilkinson	d8bccde686	[BugFix] Fix vllm_flash_attn install issues (#17267 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Aaron Pham <contact@aarnphm.xyz>	2025-04-27 17:27:56 -07:00
Chen Zhang	838cedade7	[Bugfix] Get a specific type of layer from forward context (#17222 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-04-27 00:58:05 -07:00
rasmith	8e4b351a0c	[Kernel][Triton][FP8] Adding fp8 and variable length sequence support to Triton FAv2 kernel (#12591 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2025-04-27 00:35:08 +00:00
Agata Dobrzyniewicz	c48334d405	[Hardware][Intel-Gaudi] Update hpu-extension and update bucketing system for HPU device (#17186 ) Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>	2025-04-26 05:55:14 -07:00

1 2 3 4 5 ...

364 Commits