vllm/attention at 805a8a75f2f17ee56c0882efcc34d35e1801cbee - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-04-20 20:57:15 +08:00

History

Woosuk Kwon 805a8a75f2

[Misc] Support attention logits soft-capping with flash-attn (#7022 )

2024-08-01 13:14:37 -07:00

..

[Misc] Support attention logits soft-capping with flash-attn (#7022 )

2024-08-01 13:14:37 -07:00

[Bugfix] Allow vllm to still work if triton is not installed. (#6786 )

2024-07-29 14:51:27 -07:00

__init__.py

[Core] Refactor _prepare_model_input_tensors - take 2 (#6164 )

2024-07-17 09:37:16 -07:00

layer.py

[Misc] Support attention logits soft-capping with flash-attn (#7022 )

2024-08-01 13:14:37 -07:00

selector.py

[Core] Refactor _prepare_model_input_tensors - take 2 (#6164 )

2024-07-17 09:37:16 -07:00