vllm/attention at 92effb07a48e56c531a95b696acd5f699baf16da - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-24 23:27:23 +08:00

History

[Perf][Deepseek] optimize gather_and_maybe_dequant_cache kernel's perf for extremely long sequence (#28029 )

Signed-off-by: ganyi <ygan@amd.com>

2025-11-24 19:05:46 -07:00

conftest.py

[Chore] Clean up pytorch helper functions in vllm.utils (#26908 )

2025-10-18 09:48:22 -07:00

test_aiter_flash_attn.py

[CI/Build][AMD] Skip if flash_attn_varlen_func not available in test_aiter_flash_attn.py (#29043 )

2025-11-20 20:39:49 +00:00

test_attention_selector.py

[Core] Deprecate xformers (#29262 )

2025-11-24 04:18:55 +00:00

test_attention.py

[Core] Deprecate xformers (#29262 )

2025-11-24 04:18:55 +00:00

test_cache.py

[Perf][Deepseek] optimize gather_and_maybe_dequant_cache kernel's perf for extremely long sequence (#28029 )

2025-11-24 19:05:46 -07:00

test_cascade_flash_attn.py

[CI/Build][AMD] Fix import errors in tests/kernels/attention (#29032 )

2025-11-20 17:48:09 +08:00

test_cpu_attn.py

[CPU] Refactor CPU attention backend (#27954 )

2025-11-12 09:43:06 +08:00

test_cutlass_mla_decode.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

test_deepgemm_attention.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

test_flash_attn.py

[CI/Build][AMD] Fix import errors in tests/kernels/attention (#29032 )

2025-11-20 17:48:09 +08:00

test_flashinfer_mla_decode.py

[CI/Build][AMD] Fix import errors in tests/kernels/attention (#29032 )

2025-11-20 17:48:09 +08:00

test_flashinfer_trtllm_attention.py

[CI/Build][AMD] Fix import errors in tests/kernels/attention (#29032 )

2025-11-20 17:48:09 +08:00

test_flashinfer.py

[CI/Build][AMD] Fix import errors in tests/kernels/attention (#29032 )

2025-11-20 17:48:09 +08:00

test_flashmla_sparse.py

[Misc] Clean up cruft from previous FlashMLA sparse implementation (#26125 )

2025-10-08 10:09:34 +08:00

test_flashmla.py

[Misc] Clean up cruft from previous FlashMLA sparse implementation (#26125 )

2025-10-08 10:09:34 +08:00

test_lightning_attn.py

Fix per file ruff ignores related to simplification (#26259 )

2025-10-05 20:31:53 +00:00

test_merge_attn_states.py

Replace torch.cuda.Event with torch.Event for better hardware compatibility (#26985 )

2025-11-18 11:34:36 -08:00

test_mha_attn.py

[Core] Deprecate xformers (#29262 )

2025-11-24 04:18:55 +00:00

test_mla_decode_cpu.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

test_pack_unpack_triton.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_prefix_prefill.py

[CI/Build] Fix test_prefix_prefill for AMD (#28905 )

2025-11-19 16:04:36 -05:00

test_rocm_attention_selector.py

[Attention] Implement universal BACKEND_MAP (#25900 )

2025-10-08 12:00:25 -07:00

test_triton_decode_attention.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

test_triton_unified_attention.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00