vllm/ops at c08e2b30862df5427843de76d8a619ea566600c1 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-27 19:07:18 +08:00

History

Thomas Parnell 9a7e2d0534

[Bugfix] Allow vllm to still work if triton is not installed. (#6786 )

Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>

2024-07-29 14:51:27 -07:00

..

blocksparse_attention

[Model][Phi3-Small] Remove scipy from blocksparse_attention (#6343 )

2024-07-12 10:47:17 +08:00

__init__.py

[Core] Refactor Attention Take 2 (#3462 )

2024-03-25 04:39:33 +00:00

ipex_attn.py

[Kernel][Attention] Separate Attention.kv_scale into k_scale and v_scale (#6081 )

2024-07-16 15:31:32 -07:00

paged_attn.py

[Bugfix] Allow vllm to still work if triton is not installed. (#6786 )

2024-07-29 14:51:27 -07:00

prefix_prefill.py

[Bugfix] use float32 precision in samplers/test_logprobs.py for comparing with HF (#6409 )

2024-07-15 13:14:49 -04:00

triton_flash_attention.py

[ROCm][AMD][Bugfix] adding a missing triton autotune config (#4845 )

2024-05-16 10:46:52 -07:00