vllm/model_executor at 8cd5a992bffff40434dac6c233767e4fa6359183 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-08 16:43:37 +08:00

History

Chenhui Zhang f780504d12

fix weigit loading for GQA with TP (#2379 )

2024-01-15 15:43:59 -08:00

..

fix weigit loading for GQA with TP (#2379 )

2024-01-15 15:43:59 -08:00

Address Phi modeling update 2 (#2428 )

2024-01-12 12:16:49 -08:00

Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )

2024-01-03 11:30:22 -08:00

__init__.py

Refactor Worker & InputMetadata (#1843 )

2023-11-29 22:16:37 -08:00

input_metadata.py

Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )

2024-01-03 11:30:22 -08:00

model_loader.py

Implement lazy model loader (#2044 )

2023-12-12 22:21:45 -08:00

sampling_metadata.py

Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )

2024-01-03 11:30:22 -08:00

utils.py

TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (#1622 )

2023-11-15 22:50:41 -08:00

weight_utils.py

[Minor] Fix a typo in .pt weight support (#2160 )

2023-12-17 10:12:44 -08:00