vllm/ops at 18d23f642af9c56f45ccf684a22f386fc54c6ead - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-05 10:57:11 +08:00

History

Hongxia Yang 18d23f642a

[ROCm][Hardware][AMD] Enable group query attention for triton FA (#4406 )

2024-04-26 23:37:40 -07:00

..

__init__.py

[Core] Refactor Attention Take 2 (#3462 )

2024-03-25 04:39:33 +00:00

paged_attn.py

[Misc] Add indirection layer for custom ops (#3913 )

2024-04-10 20:26:07 -07:00

prefix_prefill.py

[Bugfix][Kernel] allow non-power-of-two head sizes in prefix prefill (#4128 )

2024-04-18 00:51:28 -07:00

triton_flash_attention.py

[ROCm][Hardware][AMD] Enable group query attention for triton FA (#4406 )

2024-04-26 23:37:40 -07:00