21 Commits

Author SHA1 Message Date
Nick Hill
41deac4a3d
[BugFix] 1D query fix for MoE models (#3597) 2024-03-24 16:00:16 -07:00
Roy
f1c0fc3919
Migrate logits computation and gather to model_runner (#3233) 2024-03-20 23:25:01 +00:00
Woosuk Kwon
2daf23ab0c
Separate attention backends (#3005) 2024-03-07 01:45:50 -08:00
Philipp Moritz
31348dff03
Align LoRA code between Mistral and Mixtral (fixes #2875) (#2880)
* Fix AttributeError: MixtralModel object has no attribute org_vocab_size.

* Make LoRA logic for Mistral and Mixtral the same

---------

Co-authored-by: Pernekhan Utemuratov <pernekhan@deepinfra.com>
2024-02-15 01:00:43 +01:00
Terry
2a543d6efe
Add LoRA support for Mixtral (#2831)
* add mixtral lora support

* formatting

* fix incorrectly ported logic

* polish tests

* minor fixes and refactoring

* minor fixes

* formatting

* rename and remove redundant logic

* refactoring

* refactoring

* minor fix

* minor refactoring

* fix code smell
2024-02-14 00:55:45 +01:00
Woosuk Kwon
f0d4e14557
Add fused top-K softmax kernel for MoE (#2769) 2024-02-05 17:38:02 -08:00
Philipp Moritz
d0d93b92b1
Add unit test for Mixtral MoE layer (#2677) 2024-01-31 14:34:17 -08:00
Philipp Moritz
ab40644669
Fused MOE for Mixtral (#2542)
Co-authored-by: chen shen <scv119@gmail.com>
2024-01-29 22:43:37 -08:00
Zhuohan Li
fd4ea8ef5c
Use NCCL instead of ray for control-plane communication to remove serialization overhead (#2221) 2024-01-03 11:30:22 -08:00
Woosuk Kwon
ba4f826738
[BugFix] Fix weight loading for Mixtral with TP (#2208) 2023-12-19 16:16:11 -08:00
Woosuk Kwon
37ca558103
Optimize model execution with CUDA graph (#1926)
Co-authored-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2023-12-16 21:12:08 -08:00
Roy
eed74a558f
Simplify weight loading logic (#2133) 2023-12-16 12:41:23 -08:00
CHU Tianxiang
0fbfc4b81b
Add GPTQ support (#916) 2023-12-15 03:04:22 -08:00
Antoni Baum
21d93c140d
Optimize Mixtral with expert parallelism (#2090) 2023-12-13 23:55:07 -08:00
Woosuk Kwon
518369d78c
Implement lazy model loader (#2044) 2023-12-12 22:21:45 -08:00
Woosuk Kwon
cb3f30c600
Upgrade transformers version to 4.36.0 (#2046) 2023-12-11 18:39:14 -08:00
Woosuk Kwon
31d2ab4aff
Remove python 3.10 requirement (#2040) 2023-12-11 12:26:42 -08:00
Woosuk Kwon
6120e5aaea
Fix import error msg for megablocks (#2038) 2023-12-11 11:40:56 -08:00
Woosuk Kwon
81ce2a4b26
[Minor] Fix type annotation in Mixtral (#2036) 2023-12-11 11:32:39 -08:00
Woosuk Kwon
4ff0203987
Minor fixes for Mixtral (#2015) 2023-12-11 09:16:15 -08:00
Pierre Stock
b5f882cc98
Mixtral 8x7B support (#2011)
Co-authored-by: Pierre Stock <p@mistral.ai>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-12-11 01:09:15 -08:00