108 Commits

Author SHA1 Message Date
Philipp Moritz
31348dff03
Align LoRA code between Mistral and Mixtral (fixes #2875) (#2880)
* Fix AttributeError: MixtralModel object has no attribute org_vocab_size.

* Make LoRA logic for Mistral and Mixtral the same

---------

Co-authored-by: Pernekhan Utemuratov <pernekhan@deepinfra.com>
2024-02-15 01:00:43 +01:00
Roy
4efbac6d35
Migrate AquilaForCausalLM to LlamaForCausalLM (#2867) 2024-02-14 12:30:24 -08:00
Philipp Moritz
0c48b37c31
Fix internlm after https://github.com/vllm-project/vllm/pull/2860 (#2861) 2024-02-13 18:01:15 -08:00
Philipp Moritz
7eacffd951
Migrate InternLMForCausalLM to LlamaForCausalLM (#2860)
Co-authored-by: Roy <jasonailu87@gmail.com>
2024-02-13 17:12:05 -08:00
Terry
2a543d6efe
Add LoRA support for Mixtral (#2831)
* add mixtral lora support

* formatting

* fix incorrectly ported logic

* polish tests

* minor fixes and refactoring

* minor fixes

* formatting

* rename and remove redundant logic

* refactoring

* refactoring

* minor fix

* minor refactoring

* fix code smell
2024-02-14 00:55:45 +01:00
Philipp Moritz
317b29de0f
Remove Yi model definition, please use LlamaForCausalLM instead (#2854)
Co-authored-by: Roy <jasonailu87@gmail.com>
2024-02-13 14:22:22 -08:00
Philipp Moritz
ea356004d4
Revert "Refactor llama family models (#2637)" (#2851)
This reverts commit 5c976a7e1a1bec875bf6474824b7dff39e38de18.
2024-02-13 09:24:59 -08:00
Roy
5c976a7e1a
Refactor llama family models (#2637) 2024-02-13 00:09:23 -08:00
Woosuk Kwon
f0d4e14557
Add fused top-K softmax kernel for MoE (#2769) 2024-02-05 17:38:02 -08:00
Fengzhe Zhou
cd9e60c76c
Add Internlm2 (#2666) 2024-02-01 09:27:40 -08:00
Philipp Moritz
d0d93b92b1
Add unit test for Mixtral MoE layer (#2677) 2024-01-31 14:34:17 -08:00
Woosuk Kwon
3dad944485
Add quantized mixtral support (#2673) 2024-01-30 16:34:10 -08:00
Philipp Moritz
ab40644669
Fused MOE for Mixtral (#2542)
Co-authored-by: chen shen <scv119@gmail.com>
2024-01-29 22:43:37 -08:00
wangding zeng
5d60def02c
DeepseekMoE support with Fused MoE kernel (#2453)
Co-authored-by: roy <jasonailu87@gmail.com>
2024-01-29 21:19:48 -08:00
dakotamahan-stability
3a0e1fc070
Support for Stable LM 2 (#2598)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-01-26 12:45:19 -08:00
Junyang Lin
2832e7b9f9
fix names and license for Qwen2 (#2589) 2024-01-24 22:37:51 -08:00
Antoni Baum
9b945daaf1
[Experimental] Add multi-LoRA support (#1804)
Co-authored-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Co-authored-by: Avnish Narayan <avnish@anyscale.com>
2024-01-23 15:26:37 -08:00
Junyang Lin
94b5edeb53
Add qwen2 (#2495) 2024-01-22 14:34:21 -08:00
YingchaoX
8a25d3a71a
fix stablelm.py tensor-parallel-size bug (#2482) 2024-01-18 09:39:46 -08:00
Hyunsung Lee
e1957c6ebd
Add StableLM3B model (#2372) 2024-01-16 20:32:40 -08:00
Gary Hui
7878958c0d
Address Phi modeling update 2 (#2428) 2024-01-12 12:16:49 -08:00
Woosuk Kwon
50376faa7b
Rename phi_1_5 -> phi (#2385) 2024-01-11 16:23:43 -08:00
Zhuohan Li
fd4ea8ef5c
Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
Jong-hun Shin
4934d49274
Support GPT-NeoX Models without attention biases (#2301) 2023-12-30 11:42:04 -05:00
Woosuk Kwon
ba4f826738
[BugFix] Fix weight loading for Mixtral with TP (#2208) 2023-12-19 16:16:11 -08:00
avideci
de60a3fb93
Added DeciLM-7b and DeciLM-7b-instruct (#2062) 2023-12-19 02:29:33 -08:00
Woosuk Kwon
37ca558103
Optimize model execution with CUDA graph (#1926)
Co-authored-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2023-12-16 21:12:08 -08:00
Roy
eed74a558f
Simplify weight loading logic (#2133) 2023-12-16 12:41:23 -08:00
CHU Tianxiang
0fbfc4b81b
Add GPTQ support (#916) 2023-12-15 03:04:22 -08:00
Antoni Baum
21d93c140d
Optimize Mixtral with expert parallelism (#2090) 2023-12-13 23:55:07 -08:00
Woosuk Kwon
518369d78c
Implement lazy model loader (#2044) 2023-12-12 22:21:45 -08:00
Megha Agarwal
6428f1d051
Support MPT with GQA (#1938)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2023-12-12 10:16:05 -08:00
Woosuk Kwon
cb3f30c600
Upgrade transformers version to 4.36.0 (#2046) 2023-12-11 18:39:14 -08:00
Woosuk Kwon
31d2ab4aff
Remove python 3.10 requirement (#2040) 2023-12-11 12:26:42 -08:00
Woosuk Kwon
6120e5aaea
Fix import error msg for megablocks (#2038) 2023-12-11 11:40:56 -08:00
Woosuk Kwon
81ce2a4b26
[Minor] Fix type annotation in Mixtral (#2036) 2023-12-11 11:32:39 -08:00
Woosuk Kwon
4ff0203987
Minor fixes for Mixtral (#2015) 2023-12-11 09:16:15 -08:00
Pierre Stock
b5f882cc98
Mixtral 8x7B support (#2011)
Co-authored-by: Pierre Stock <p@mistral.ai>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-12-11 01:09:15 -08:00
Woosuk Kwon
24cde76a15
[Minor] Add comment on skipping rope caches (#2004) 2023-12-10 10:04:12 -08:00
Woosuk Kwon
fe470ae5ad
[Minor] Fix code style for baichuan (#2003) 2023-12-09 19:24:29 -08:00
Jun Gao
3a8c2381f7
Fix for KeyError on Loading LLaMA (#1978) 2023-12-09 15:59:57 -08:00
firebook
2b981012a6
Fix Baichuan2-7B-Chat (#1987) 2023-12-08 09:38:36 -08:00
Jie Li
ebede26ebf
Make InternLM follow rope_scaling in config.json (#1956)
Co-authored-by: lijie8 <lijie8@sensetime.com>
2023-12-07 08:32:08 -08:00
Woosuk Kwon
e5452ddfd6
Normalize head weights for Baichuan 2 (#1876) 2023-11-30 20:03:58 -08:00
Woosuk Kwon
27feead2f8
Refactor Worker & InputMetadata (#1843) 2023-11-29 22:16:37 -08:00
Woosuk Kwon
a9e4574261
Refactor Attention (#1840) 2023-11-29 15:37:31 -08:00
Woosuk Kwon
a7b3e33078
[Fix] Fix RoPE in ChatGLM-32K (#1841) 2023-11-29 13:01:19 -08:00
Woosuk Kwon
b943890484
Fix OPT param names (#1819) 2023-11-28 11:22:44 -08:00
Woosuk Kwon
7c600440f7
Fix model docstrings (#1764) 2023-11-23 23:04:44 -08:00
Woosuk Kwon
cf35d8f3d7
[BugFix] Fix TP support for AWQ (#1731) 2023-11-20 21:42:45 -08:00