vllm/attention at 14dbd5a7674e5de2862c18adb711d9feecd35063 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-29 15:27:30 +08:00

History

Joe 14dbd5a767

[Model] H2O Danube3-4b (#6451 )

2024-07-26 20:47:50 -07:00

..

[Bugfix] Fix decode tokens w. CUDA graph (#6757 )

2024-07-24 22:33:56 -07:00

[Model] H2O Danube3-4b (#6451 )

2024-07-26 20:47:50 -07:00

__init__.py

[Core] Refactor _prepare_model_input_tensors - take 2 (#6164 )

2024-07-17 09:37:16 -07:00

layer.py

[Misc] Support FP8 kv cache scales from compressed-tensors (#6528 )

2024-07-23 04:11:50 +00:00

selector.py

[Core] Refactor _prepare_model_input_tensors - take 2 (#6164 )

2024-07-17 09:37:16 -07:00