15 Commits

Author SHA1 Message Date
Jie Li
ebede26ebf
Make InternLM follow rope_scaling in config.json (#1956)
Co-authored-by: lijie8 <lijie8@sensetime.com>
2023-12-07 08:32:08 -08:00
Woosuk Kwon
27feead2f8
Refactor Worker & InputMetadata (#1843) 2023-11-29 22:16:37 -08:00
Woosuk Kwon
a9e4574261
Refactor Attention (#1840) 2023-11-29 15:37:31 -08:00
Simon Mo
5ffc0d13a2
Migrate linter from pylint to ruff (#1665) 2023-11-20 11:58:01 -08:00
ljss
e1054247ba
[Optimization] Implement fused add rmsnorm (#1667) 2023-11-18 18:18:02 -08:00
Zhuohan Li
7076fa1c9f
TP/quantization/weight loading refactor part 2 - Refactor quantized linear logic and extend quantization support to all models (#1622)
Refactor the tensor parallelism, quantization, and weight-loading codes.

Summary of the new features enabled by this PR:
- **All models** are able to be quantized with AWQ and SqueezeLLM, and [soon GPTQ](https://github.com/vllm-project/vllm/pull/1580).
- Model loading code became much simpler.
- Support model parallelism for all MQA/GQA models when the number of key/value heads is smaller than the tensor parallel size.
2023-11-15 22:50:41 -08:00
Woosuk Kwon
aa9af07cac
Fix bias in InternLM (#1501) 2023-10-29 16:24:18 -07:00
Zhuohan Li
ba0bfd40e2
TP/quantization/weight loading refactor part 1 - Simplify parallel linear logic (#1181) 2023-10-02 15:36:09 -07:00
Antoni Baum
3302f0aef3
rope_theta and max_position_embeddings from config (#1096)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: wnma3mz <wnma3mz@gmail.com>
2023-09-20 13:35:11 -07:00
Jasmond L
ab019eea75
Add Model Revision Support (#1014)
Co-authored-by: Jasmond Loh <Jasmond.Loh@hotmail.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-09-13 15:20:02 -07:00
Zhuohan Li
c957c741d9
Enable safetensors loading for all models (#974) 2023-09-07 15:49:52 -07:00
Zhuohan Li
002800f081
Align vLLM's beam search implementation with HF generate (#857) 2023-09-04 17:29:42 -07:00
JFDuan
0d93f15694
Accelerate LLaMA model loading (#234) 2023-08-30 01:00:13 -07:00
WRH
462ae5220a
[Fix] unwantted bias in InternLM Model (#740) 2023-08-11 11:40:37 -07:00
Jia Guoqing
735ecfff61
add internlm model (#528) 2023-08-08 16:35:06 -07:00