This website requires JavaScript.
Explore
Help
Sign In
xinyun
/
vllm
Watch
1
Star
0
Fork
0
You've already forked vllm
mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced
2026-04-26 12:27:03 +08:00
Code
Issues
Packages
Projects
Releases
Wiki
Activity
vllm
/
vllm
/
attention
History
Gregory Shtrasberg
56b325e977
[ROCm][AMD][Model]Adding alibi slopes support in ROCm triton flash attention and naive flash attention (
#6043
)
...
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com>
2024-07-03 22:19:38 -07:00
..
backends
[ROCm][AMD][Model]Adding alibi slopes support in ROCm triton flash attention and naive flash attention (
#6043
)
2024-07-03 22:19:38 -07:00
ops
[hardware][misc] introduce platform abstraction (
#6080
)
2024-07-02 20:12:22 -07:00
__init__.py
[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API (
#4681
)
2024-05-15 14:00:10 +09:00
layer.py
[Bugfix] Only add
Attention.kv_scale
if kv cache quantization is enabled (
#5936
)
2024-06-28 21:12:40 +00:00
selector.py
[Kernel] Flashinfer for prefill & decode, with Cudagraph support for decode (
#4628
)
2024-06-28 15:28:49 -07:00