vllm/ops at 0f9a6e3d229cade0ae9a53a4f69a38f52e430bd0 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-05 17:27:16 +08:00

History

DefTruth 0f9a6e3d22

[Bugfix][Kernel] allow non-power-of-2 for prefix prefill with alibi (#4573 )

2024-05-08 09:19:58 -07:00

..

__init__.py

[Core] Refactor Attention Take 2 (#3462 )

2024-03-25 04:39:33 +00:00

paged_attn.py

[Core][Optimization] change python dict to pytorch tensor (#4607 )

2024-05-06 21:30:27 -07:00

prefix_prefill.py

[Bugfix][Kernel] allow non-power-of-2 for prefix prefill with alibi (#4573 )

2024-05-08 09:19:58 -07:00

triton_flash_attention.py

[ROCm][Hardware][AMD] Enable group query attention for triton FA (#4406 )

2024-04-26 23:37:40 -07:00