vllm/ops at c8a7d51c4982b7b425debe5473867d9983e728fd - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-17 15:10:12 +08:00

History

Michael Goin 978aed5300

[Kernel][Attention] Separate Attention.kv_scale into k_scale and v_scale (#6081 )

2024-07-16 15:31:32 -07:00

..

blocksparse_attention

[Model][Phi3-Small] Remove scipy from blocksparse_attention (#6343 )

2024-07-12 10:47:17 +08:00

__init__.py

[Core] Refactor Attention Take 2 (#3462 )

2024-03-25 04:39:33 +00:00

ipex_attn.py

[Kernel][Attention] Separate Attention.kv_scale into k_scale and v_scale (#6081 )

2024-07-16 15:31:32 -07:00

paged_attn.py

[Kernel][Attention] Separate Attention.kv_scale into k_scale and v_scale (#6081 )

2024-07-16 15:31:32 -07:00

prefix_prefill.py

[Bugfix] use float32 precision in samplers/test_logprobs.py for comparing with HF (#6409 )

2024-07-15 13:14:49 -04:00

triton_flash_attention.py

[ROCm][AMD][Bugfix] adding a missing triton autotune config (#4845 )

2024-05-16 10:46:52 -07:00