xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-01-07 00:03:07 +08:00

Author	SHA1	Message	Date
Jie Li	ebede26ebf	Make InternLM follow `rope_scaling` in `config.json` (#1956 ) Co-authored-by: lijie8 <lijie8@sensetime.com>	2023-12-07 08:32:08 -08:00
Woosuk Kwon	27feead2f8	Refactor Worker & InputMetadata (#1843 )	2023-11-29 22:16:37 -08:00
Woosuk Kwon	a9e4574261	Refactor Attention (#1840 )	2023-11-29 15:37:31 -08:00
Simon Mo	5ffc0d13a2	Migrate linter from `pylint` to `ruff` (#1665 )	2023-11-20 11:58:01 -08:00
ljss	e1054247ba	[Optimization] Implement fused add rmsnorm (#1667 )	2023-11-18 18:18:02 -08:00
Zhuohan Li	7076fa1c9f	TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (#1622 ) Refactor the tensor parallelism, quantization, and weight-loading codes. Summary of the new features enabled by this PR: - All models are able to be quantized with AWQ and SqueezeLLM, and [soon GPTQ](https://github.com/vllm-project/vllm/pull/1580). - Model loading code became much simpler. - Support model parallelism for all MQA/GQA models when the number of key/value heads is smaller than the tensor parallel size.	2023-11-15 22:50:41 -08:00
Woosuk Kwon	aa9af07cac	Fix bias in InternLM (#1501 )	2023-10-29 16:24:18 -07:00
Zhuohan Li	ba0bfd40e2	TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181 )	2023-10-02 15:36:09 -07:00
Antoni Baum	3302f0aef3	rope_theta and max_position_embeddings from config (#1096 ) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: wnma3mz <wnma3mz@gmail.com>	2023-09-20 13:35:11 -07:00
Jasmond L	ab019eea75	Add Model Revision Support (#1014 ) Co-authored-by: Jasmond Loh <Jasmond.Loh@hotmail.com> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>	2023-09-13 15:20:02 -07:00
Zhuohan Li	c957c741d9	Enable safetensors loading for all models (#974 )	2023-09-07 15:49:52 -07:00
Zhuohan Li	002800f081	Align vLLM's beam search implementation with HF generate (#857 )	2023-09-04 17:29:42 -07:00
JFDuan	0d93f15694	Accelerate LLaMA model loading (#234 )	2023-08-30 01:00:13 -07:00
WRH	462ae5220a	[Fix] unwantted bias in InternLM Model (#740 )	2023-08-11 11:40:37 -07:00
Jia Guoqing	735ecfff61	add internlm model (#528 )	2023-08-08 16:35:06 -07:00

15 Commits