xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-14 08:54:35 +08:00

Author	SHA1	Message	Date
Woosuk Kwon	8d17774f92	Add AWQ support for all models (#1714 )	2023-11-18 17:56:47 -08:00
Zhuohan Li	7076fa1c9f	TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (#1622 ) Refactor the tensor parallelism, quantization, and weight-loading codes. Summary of the new features enabled by this PR: - All models are able to be quantized with AWQ and SqueezeLLM, and [soon GPTQ](https://github.com/vllm-project/vllm/pull/1580). - Model loading code became much simpler. - Support model parallelism for all MQA/GQA models when the number of key/value heads is smaller than the tensor parallel size.	2023-11-15 22:50:41 -08:00
Zhuohan Li	ba0bfd40e2	TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181 )	2023-10-02 15:36:09 -07:00
Jasmond L	ab019eea75	Add Model Revision Support (#1014 ) Co-authored-by: Jasmond Loh <Jasmond.Loh@hotmail.com> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2023-09-13 15:20:02 -07:00
Zhuohan Li	c957c741d9	Enable safetensors loading for all models (#974 )	2023-09-07 15:49:52 -07:00
Zhuohan Li	002800f081	Align vLLM's beam search implementation with HF generate (#857 )	2023-09-04 17:29:42 -07:00
Wen Sun	dbed69058c	Fix the `KeyError` when loading bloom-based models (#441 )	2023-07-13 21:58:09 -07:00
Zhuohan Li	42e0c1df78	[Quality] Add CI for formatting (#343 )	2023-07-03 14:50:56 -07:00
Woosuk Kwon	e41f06702c	Add support for BLOOM (#331 )	2023-07-03 13:12:35 -07:00

1 2