vllm/attention at ed7af3178aa24b618be276104e21fdf8b9fcc9f2 - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-04-16 01:37:03 +08:00

History

Lucas Wilkinson abe93bce59

[Attention] Make seq_lens_cpu optional in CommonAttentionMetadata to enable true async spec-decode (#29624 )

Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>

2025-12-09 17:18:10 -08:00

backends

[v1] Add PrefixLM support to FlexAttention backend (#27938 )

2025-12-07 15:51:36 +00:00

layers

[Attention] Make seq_lens_cpu optional in CommonAttentionMetadata to enable true async spec-decode (#29624 )

2025-12-09 17:18:10 -08:00

ops

[Perf] Remove sync point in vit torch sdpa attn backend (#30232 )

2025-12-08 07:12:42 +00:00

utils

[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315 )

2025-12-05 09:48:43 -08:00

__init__.py

[Attention] Remove imports from vllm/attention/__init__.py (#29342 )

2025-11-26 10:53:15 -07:00

layer.py

[CI/Build] Make test_mha_attn.py run on correct platform only and check for flash_attn_varlen_func in layer.py (#29145 )

2025-12-09 20:18:17 +00:00

selector.py

[v1] Add PrefixLM support to FlexAttention backend (#27938 )

2025-12-07 15:51:36 +00:00