10 Commits

Author SHA1 Message Date
Woosuk Kwon
752c6ade2e
[V0 Deprecation] Deprecate BlockSparse Attention & Phi3-Small (#21217)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-19 13:53:17 -07:00
Yong Hoon Shin
bdf13965ab
[V1] Support cross-layer KV sharing (#18212)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-06-03 20:33:07 +00:00
Simon Mo
02f0c7b220
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-06-03 11:20:17 -07:00
vllmellm
77b6e74fe2
[ROCm] Remove unnecessary assertion of max_model_len in ROCM_AITER_MLA attention backend. (#18938)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-05-29 22:33:17 -07:00
qli88
4f8b373225
[BugFix][AMD] Compatible patch for AITER lib after 04/20 (#17912)
Signed-off-by: Qiang Li <qiang.li2@amd.com>
2025-05-13 23:05:20 -07:00
Michael Goin
85b72cb7b1
Revert "[BugFix][AMD] Compatible patch for latest AITER(05/07/2025)" (#17910) 2025-05-09 08:58:18 -07:00
qli88
9f64e93415
[BugFix][AMD] Compatible patch for latest AITER(05/07/2025) (#17864)
Signed-off-by: Qiang Li <qiang.li2@amd.com>
2025-05-09 08:59:36 -06:00
Lucas Wilkinson
5e6f939484
[Attention] MLA move rotary embedding to cuda-graph region (#17668)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-05-09 11:14:42 +08:00
Lucas Wilkinson
afcb3f8863
[Attention] MLA move o_proj q_proj into cuda-graph region (#17484)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
2025-05-02 03:16:26 +00:00
vllmellm
30bc3e0f66
[FEAT][ROCm]: Support AITER MLA (#15893)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: qli88 <qiang.li2@amd.com>
2025-04-22 09:31:13 -07:00