Woosuk Kwon
|
752c6ade2e
|
[V0 Deprecation] Deprecate BlockSparse Attention & Phi3-Small (#21217)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-19 13:53:17 -07:00 |
|
Yong Hoon Shin
|
bdf13965ab
|
[V1] Support cross-layer KV sharing (#18212)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-06-03 20:33:07 +00:00 |
|
Simon Mo
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
|
vllmellm
|
77b6e74fe2
|
[ROCm] Remove unnecessary assertion of max_model_len in ROCM_AITER_MLA attention backend. (#18938)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-05-29 22:33:17 -07:00 |
|
qli88
|
4f8b373225
|
[BugFix][AMD] Compatible patch for AITER lib after 04/20 (#17912)
Signed-off-by: Qiang Li <qiang.li2@amd.com>
|
2025-05-13 23:05:20 -07:00 |
|
Michael Goin
|
85b72cb7b1
|
Revert "[BugFix][AMD] Compatible patch for latest AITER(05/07/2025)" (#17910)
|
2025-05-09 08:58:18 -07:00 |
|
qli88
|
9f64e93415
|
[BugFix][AMD] Compatible patch for latest AITER(05/07/2025) (#17864)
Signed-off-by: Qiang Li <qiang.li2@amd.com>
|
2025-05-09 08:59:36 -06:00 |
|
Lucas Wilkinson
|
5e6f939484
|
[Attention] MLA move rotary embedding to cuda-graph region (#17668)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-05-09 11:14:42 +08:00 |
|
Lucas Wilkinson
|
afcb3f8863
|
[Attention] MLA move o_proj q_proj into cuda-graph region (#17484)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-05-02 03:16:26 +00:00 |
|
vllmellm
|
30bc3e0f66
|
[FEAT][ROCm]: Support AITER MLA (#15893)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: qli88 <qiang.li2@amd.com>
|
2025-04-22 09:31:13 -07:00 |
|