vllm/attention at 21fe7b481a3a84dc9ebe2497ec89a17002ad52c5 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-04-18 08:27:03 +08:00

History

youkaichao a4c4daf364

[misc] use out argument for flash attention (#10822 )

Signed-off-by: youkaichao <youkaichao@gmail.com>

2024-12-02 10:50:10 +00:00

..

[misc] use out argument for flash attention (#10822 )

2024-12-02 10:50:10 +00:00

[Bugfix] Fix chunked prefill with model dtype float32 on Turing Devices (#9850 )

2024-11-25 12:23:32 -05:00

__init__.py

[Core] Add AttentionState abstraction (#7663 )

2024-08-20 18:50:45 +00:00

layer.py

[misc] use out argument for flash attention (#10822 )

2024-12-02 10:50:10 +00:00

selector.py

[Platform][Refactor] Extract func get_default_attn_backend to Platform (#10358 )

2024-11-19 11:22:26 +08:00