vllm/attention at 8f20fc04bf7384089395caa021766cd352d0cf0b - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-25 03:07:11 +08:00

History

Michał Moskal e8cc7967ff

[Bugfix][Kernel] allow non-power-of-two head sizes in prefix prefill (#4128 )

2024-04-18 00:51:28 -07:00

..

[Test] Test multiple attn backend for chunked prefill. (#4023 )

2024-04-12 09:56:57 -07:00

[Bugfix][Kernel] allow non-power-of-two head sizes in prefix prefill (#4128 )

2024-04-18 00:51:28 -07:00

__init__.py

[Core][5/N] Fully working chunked prefill e2e (#3884 )

2024-04-10 17:56:48 -07:00

layer.py

[Core][5/N] Fully working chunked prefill e2e (#3884 )

2024-04-10 17:56:48 -07:00

selector.py

[Test] Add xformer and flash attn tests (#3961 )

2024-04-11 03:09:50 +00:00