vllm/attention at 0d8451c3a45d309e58de5e1c546f043de461d478 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-27 20:07:22 +08:00

History

Jani Monoses 0a56bcc03d

[Bugfix][Hardware][CPU] Enable Gemma2 with SDPA on CPU backend (#11169 )

2024-12-13 18:00:40 +00:00

..

[Bugfix][Hardware][CPU] Enable Gemma2 with SDPA on CPU backend (#11169 )

2024-12-13 18:00:40 +00:00

[Bugfix] Fix chunked prefill with model dtype float32 on Turing Devices (#9850 )

2024-11-25 12:23:32 -05:00

__init__.py

[Core] Add AttentionState abstraction (#7663 )

2024-08-20 18:50:45 +00:00

layer.py

[Model] Consolidate ViTs attention implementation without mask (#10893 )

2024-12-04 18:11:08 +00:00

selector.py

[Platform][Refactor] Extract func get_default_attn_backend to Platform (#10358 )

2024-11-19 11:22:26 +08:00