vllm/attention at 1adddb14bf0e1a603581bca49e8d29e8bfb337dc - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-01-18 23:54:38 +08:00

History

Cody Yu 309aaef825

[Bugfix] Fix decode tokens w. CUDA graph (#6757 )

2024-07-24 22:33:56 -07:00

..

[Bugfix] Fix decode tokens w. CUDA graph (#6757 )

2024-07-24 22:33:56 -07:00

[Kernel][Attention] Separate Attention.kv_scale into k_scale and v_scale (#6081 )

2024-07-16 15:31:32 -07:00

__init__.py

[Core] Refactor _prepare_model_input_tensors - take 2 (#6164 )

2024-07-17 09:37:16 -07:00

layer.py

[Misc] Support FP8 kv cache scales from compressed-tensors (#6528 )

2024-07-23 04:11:50 +00:00

selector.py

[Core] Refactor _prepare_model_input_tensors - take 2 (#6164 )

2024-07-17 09:37:16 -07:00