vllm/attention at c8a7d51c4982b7b425debe5473867d9983e728fd - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-17 04:02:17 +08:00

History

Noam Gat c8a7d51c49

[Bugfix] Update flashinfer.py with PagedAttention forwards - Fixes Gemma2 OpenAI Server Crash (#6501 )

2024-07-18 07:47:13 +00:00

..

[Bugfix] Update flashinfer.py with PagedAttention forwards - Fixes Gemma2 OpenAI Server Crash (#6501 )

2024-07-18 07:47:13 +00:00

[Kernel][Attention] Separate Attention.kv_scale into k_scale and v_scale (#6081 )

2024-07-16 15:31:32 -07:00

__init__.py

[Core] Refactor _prepare_model_input_tensors - take 2 (#6164 )

2024-07-17 09:37:16 -07:00

layer.py

[Kernel][Attention] Separate Attention.kv_scale into k_scale and v_scale (#6081 )

2024-07-16 15:31:32 -07:00

selector.py

[Core] Refactor _prepare_model_input_tensors - take 2 (#6164 )

2024-07-17 09:37:16 -07:00