This website requires JavaScript.
Explore
Help
Sign In
xinyun
/
vllm
Watch
1
Star
0
Fork
0
You've already forked vllm
mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced
2026-05-17 04:02:17 +08:00
Code
Issues
Packages
Projects
Releases
Wiki
Activity
vllm
/
vllm
/
attention
History
youkaichao
482045ee77
[hardware][misc] introduce platform abstraction (
#6080
)
2024-07-02 20:12:22 -07:00
..
backends
[Bugfix] Add explicit
end_forward
calls to flashinfer (
#6044
)
2024-07-01 23:08:58 +00:00
ops
[hardware][misc] introduce platform abstraction (
#6080
)
2024-07-02 20:12:22 -07:00
__init__.py
[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API (
#4681
)
2024-05-15 14:00:10 +09:00
layer.py
[Bugfix] Only add
Attention.kv_scale
if kv cache quantization is enabled (
#5936
)
2024-06-28 21:12:40 +00:00
selector.py
[Kernel] Flashinfer for prefill & decode, with Cudagraph support for decode (
#4628
)
2024-06-28 15:28:49 -07:00