vllm/mla at b5945d49c08b66658110fa1c63e55fde66fcfad7 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-23 18:35:01 +08:00

History

Ming Yang fba8906930

[perf] Use direct copy (broadcast) instead of cat for k_nope/k_pe in MLA prefill (#29710 )

Signed-off-by: Ming Yang <minos.future@gmail.com>

2025-12-11 08:20:45 +00:00

..

__init__.py

…

aiter_triton_mla.py

…

common.py

[perf] Use direct copy (broadcast) instead of cat for k_nope/k_pe in MLA prefill (#29710 )

2025-12-11 08:20:45 +00:00

cutlass_mla.py

[Attention] Refactor FA block_size limitations to hybrid models only (#29084 )

2025-11-22 06:38:44 -08:00

flashattn_mla.py

[DCP][Bugfix][CI] Fix accuracy issue of DCP when using FLASH_ATTN_MLA (#30309 )

2025-12-09 08:22:14 +00:00

flashinfer_mla.py

[Attention] Refactor FA block_size limitations to hybrid models only (#29084 )

2025-11-22 06:38:44 -08:00

flashmla_sparse.py

[Attention] Refactor FA block_size limitations to hybrid models only (#29084 )

2025-11-22 06:38:44 -08:00

flashmla.py

[Attention] Refactor FA block_size limitations to hybrid models only (#29084 )

2025-11-22 06:38:44 -08:00

indexer.py

[Attention] Refactor FA block_size limitations to hybrid models only (#29084 )

2025-11-22 06:38:44 -08:00

rocm_aiter_mla_sparse.py

[ROCm] Add AMD GPU support on Deepseek v3.2 and SparseMLA (#26670 )

2025-11-20 02:54:01 -08:00

rocm_aiter_mla.py

[ROCm][MLA] enable fp8 MLA decode on ROCm (#28032 )

2025-11-25 10:15:02 +08:00

triton_mla.py

…