Sage Moore
|
3d833aa759
|
cleanup
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-02 21:20:21 +00:00 |
|
Lucas Wilkinson
|
f7a3ee0ea1
|
Merge remote-tracking branch 'origin/main' into lwilkinson/attn-slicing
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-07-02 16:52:19 +00:00 |
|
Sage Moore
|
57d404bbb8
|
misc
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-07-02 16:37:58 +00:00 |
|
Sage Moore
|
0889f66297
|
Merge branch 'main' of https://github.com/neuralmagic/vllm into lwilkinson/attn-slicing
|
2025-06-18 13:56:24 +00:00 |
|
Charlie Fu
|
a44b1c951d
|
[Feature][ROCm] Add full graph capture support for TritonAttentionBackend (#19158)
Signed-off-by: charlifu <charlifu@amd.com>
|
2025-06-17 17:03:06 -04:00 |
|
Nicolò Lucchesi
|
4c8f64faa7
|
[V1][Kernel] Flashinfer HND KV cache layout (#19280)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-06-17 09:09:22 -04:00 |
|
Luka Govedič
|
3597b06a4f
|
[CUDA] Enable full cudagraph for FlashMLA (#18581)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-06-13 18:12:26 +00:00 |
|
Sage Moore
|
642bf2dd8b
|
Merge branch 'main' of https://github.com/neuralmagic/vllm into lwilkinson/attn-slicing
|
2025-06-08 18:02:06 +00:00 |
|
Yong Hoon Shin
|
bdf13965ab
|
[V1] Support cross-layer KV sharing (#18212)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-06-03 20:33:07 +00:00 |
|
Simon Mo
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
|
Lucas Wilkinson
|
8293182c8c
|
wip
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-05-22 20:51:35 +00:00 |
|
Chen Zhang
|
cba31c47c4
|
[v1] AttentionMetadata for each layer (#17394)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-05-06 07:58:37 -07:00 |
|