vllm/model_executor at 4934d492744d14104353b8236ef8a0405edf1622 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-02 11:51:19 +08:00

History

Jong-hun Shin 4934d49274

Support GPT-NeoX Models without attention biases (#2301 )

2023-12-30 11:42:04 -05:00

..

Remove Sampler copy stream (#2209 )

2023-12-20 00:04:33 -08:00

Support GPT-NeoX Models without attention biases (#2301 )

2023-12-30 11:42:04 -05:00

Remove dependency on CuPy (#2152 )

2023-12-17 01:49:07 -08:00

__init__.py

Refactor Worker & InputMetadata (#1843 )

2023-11-29 22:16:37 -08:00

input_metadata.py

Optimize model execution with CUDA graph (#1926 )

2023-12-16 21:12:08 -08:00

model_loader.py

Implement lazy model loader (#2044 )

2023-12-12 22:21:45 -08:00

sampling_metadata.py

Make sampler less blocking (#1889 )

2023-12-17 23:03:49 +08:00

utils.py

TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (#1622 )

2023-11-15 22:50:41 -08:00

weight_utils.py

[Minor] Fix a typo in .pt weight support (#2160 )

2023-12-17 10:12:44 -08:00