vllm/model_executor at d27f4bae393214b4e7715fc3cb5754d4bf801bce - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-22 12:07:18 +08:00

History

Roy d27f4bae39

Fix rope cache key error (#1867 )

2023-11-30 08:29:28 -08:00

..

Fix rope cache key error (#1867 )

2023-11-30 08:29:28 -08:00

Refactor Worker & InputMetadata (#1843 )

2023-11-29 22:16:37 -08:00

Correct comments in parallel_state.py (#1818 )

2023-11-28 10:19:35 -08:00

__init__.py

Refactor Worker & InputMetadata (#1843 )

2023-11-29 22:16:37 -08:00

input_metadata.py

Refactor Worker & InputMetadata (#1843 )

2023-11-29 22:16:37 -08:00

model_loader.py

Init model on GPU to reduce CPU memory footprint (#1796 )

2023-11-27 11:18:26 -08:00

sampling_metadata.py

Refactor Worker & InputMetadata (#1843 )

2023-11-29 22:16:37 -08:00

utils.py

TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (#1622 )

2023-11-15 22:50:41 -08:00

weight_utils.py

[BugFix] Fix a bug in loading safetensors (#1732 )

2023-11-20 15:51:18 -08:00