vllm/attention at 9b0d1aa27711e8a4c149bf457737e63e40b69b4f - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-07 23:57:24 +08:00

History

Roberto L. Castro 4fa7ce46f3

[Feature] Add SM103 (Blackwell Ultra) Support to vLLM (#30484 )

Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>

2025-12-12 19:34:23 -08:00

conftest.py

[Chore] Clean up pytorch helper functions in vllm.utils (#26908 )

2025-10-18 09:48:22 -07:00

test_aiter_flash_attn.py

[CI/Build][AMD] Skip if flash_attn_varlen_func not available in test_aiter_flash_attn.py (#29043 )

2025-11-20 20:39:49 +00:00

test_attention_selector.py

[Misc] Remove redundant attention var constants (#29650 )

2025-11-28 04:35:19 -08:00

test_attention.py

[Core] Deprecate xformers (#29262 )

2025-11-24 04:18:55 +00:00

test_cache.py

[Perf][Deepseek] optimize gather_and_maybe_dequant_cache kernel's perf for extremely long sequence (#28029 )

2025-11-24 19:05:46 -07:00

test_cascade_flash_attn.py

[CI/Build][AMD] Fix import errors in tests/kernels/attention (#29032 )

2025-11-20 17:48:09 +08:00

test_cpu_attn.py

[cpu][ci] Add CPU Attention Tests for Neon Backend (#30347 )

2025-12-10 05:37:35 +00:00

test_cutlass_mla_decode.py

[Feature] Add SM103 (Blackwell Ultra) Support to vLLM (#30484 )

2025-12-12 19:34:23 -08:00

test_deepgemm_attention.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

test_flash_attn.py

[CI/Build][AMD] Fix import errors in tests/kernels/attention (#29032 )

2025-11-20 17:48:09 +08:00

test_flashinfer_mla_decode.py

[CI/Build][AMD] Fix import errors in tests/kernels/attention (#29032 )

2025-11-20 17:48:09 +08:00

test_flashinfer_trtllm_attention.py

[Feature] Add SM103 (Blackwell Ultra) Support to vLLM (#30484 )

2025-12-12 19:34:23 -08:00

test_flashinfer.py

[CI/Build][AMD] Fix import errors in tests/kernels/attention (#29032 )

2025-11-20 17:48:09 +08:00

test_flashmla_sparse.py

[Misc] Clean up cruft from previous FlashMLA sparse implementation (#26125 )

2025-10-08 10:09:34 +08:00

test_flashmla.py

[Misc] Clean up cruft from previous FlashMLA sparse implementation (#26125 )

2025-10-08 10:09:34 +08:00

test_lightning_attn.py

Fix per file ruff ignores related to simplification (#26259 )

2025-10-05 20:31:53 +00:00

test_merge_attn_states.py

Replace torch.cuda.Event with torch.Event for better hardware compatibility (#26985 )

2025-11-18 11:34:36 -08:00

test_mha_attn.py

[CI/Build] Make test_mha_attn.py run on correct platform only and check for flash_attn_varlen_func in layer.py (#29145 )

2025-12-09 20:18:17 +00:00

test_mla_decode_cpu.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

test_pack_unpack_triton.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_prefix_prefill.py

[CI/Build] Fix test_prefix_prefill for AMD (#28905 )

2025-11-19 16:04:36 -05:00

test_rocm_attention_selector.py

[Misc] Remove redundant attention var constants (#29650 )

2025-11-28 04:35:19 -08:00

test_triton_decode_attention.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

test_triton_unified_attention.py

[Kernel] Support CUDA Graphs in 3D Triton Attention Kernel (#28306 )

2025-12-12 16:55:40 +01:00