vllm/attention at 56b325e977435af744f8b3dca7af0ca209663558 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-04-28 18:07:06 +08:00

History

Gregory Shtrasberg 56b325e977

[ROCm][AMD][Model]Adding alibi slopes support in ROCm triton flash attention and naive flash attention (#6043 )

Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>

2024-07-03 22:19:38 -07:00

..

[ROCm][AMD][Model]Adding alibi slopes support in ROCm triton flash attention and naive flash attention (#6043 )

2024-07-03 22:19:38 -07:00

[hardware][misc] introduce platform abstraction (#6080 )

2024-07-02 20:12:22 -07:00

__init__.py

[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API (#4681 )

2024-05-15 14:00:10 +09:00

layer.py

[Bugfix] Only add Attention.kv_scale if kv cache quantization is enabled (#5936 )

2024-06-28 21:12:40 +00:00

selector.py

[Kernel] Flashinfer for prefill & decode, with Cudagraph support for decode (#4628 )

2024-06-28 15:28:49 -07:00