vllm/attention at 482045ee77a49d69ab9464d9d727960890d950f1 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-17 04:02:17 +08:00

History

youkaichao 482045ee77

[hardware][misc] introduce platform abstraction (#6080 )

2024-07-02 20:12:22 -07:00

..

[Bugfix] Add explicit end_forward calls to flashinfer (#6044 )

2024-07-01 23:08:58 +00:00

[hardware][misc] introduce platform abstraction (#6080 )

2024-07-02 20:12:22 -07:00

__init__.py

[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API (#4681 )

2024-05-15 14:00:10 +09:00

layer.py

[Bugfix] Only add Attention.kv_scale if kv cache quantization is enabled (#5936 )

2024-06-28 21:12:40 +00:00

selector.py

[Kernel] Flashinfer for prefill & decode, with Cudagraph support for decode (#4628 )

2024-06-28 15:28:49 -07:00