vllm/model_executor at beb89f68b448a43ac112b48e3834f80a2df626cb - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-02 10:31:18 +08:00

History

Casper beb89f68b4

AWQ: Up to 2.66x higher throughput (#2566 )

2024-01-26 23:53:17 -08:00

..

AWQ: Up to 2.66x higher throughput (#2566 )

2024-01-26 23:53:17 -08:00

Support for Stable LM 2 (#2598 )

2024-01-26 12:45:19 -08:00

[Experimental] Add multi-LoRA support (#1804 )

2024-01-23 15:26:37 -08:00

__init__.py

Refactor Worker & InputMetadata (#1843 )

2023-11-29 22:16:37 -08:00

input_metadata.py

[Experimental] Prefix Caching Support (#1669 )

2024-01-17 16:32:10 -08:00

model_loader.py

[Experimental] Add multi-LoRA support (#1804 )

2024-01-23 15:26:37 -08:00

sampling_metadata.py

Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221 )

2024-01-03 11:30:22 -08:00

utils.py

TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (#1622 )

2023-11-15 22:50:41 -08:00

weight_utils.py

[Bugfix] fix load local safetensors model (#2512 )

2024-01-19 16:26:16 -08:00