Nick Hill
41deac4a3d
[BugFix] 1D query fix for MoE models ( #3597 )
2024-03-24 16:00:16 -07:00
Roy
f1c0fc3919
Migrate logits computation and gather to model_runner ( #3233 )
2024-03-20 23:25:01 +00:00
Woosuk Kwon
2daf23ab0c
Separate attention backends ( #3005 )
2024-03-07 01:45:50 -08:00
Philipp Moritz
31348dff03
Align LoRA code between Mistral and Mixtral ( fixes #2875 ) ( #2880 )
...
* Fix AttributeError: MixtralModel object has no attribute org_vocab_size.
* Make LoRA logic for Mistral and Mixtral the same
---------
Co-authored-by: Pernekhan Utemuratov <pernekhan@deepinfra.com>
2024-02-15 01:00:43 +01:00
Terry
2a543d6efe
Add LoRA support for Mixtral ( #2831 )
...
* add mixtral lora support
* formatting
* fix incorrectly ported logic
* polish tests
* minor fixes and refactoring
* minor fixes
* formatting
* rename and remove redundant logic
* refactoring
* refactoring
* minor fix
* minor refactoring
* fix code smell
2024-02-14 00:55:45 +01:00
Woosuk Kwon
f0d4e14557
Add fused top-K softmax kernel for MoE ( #2769 )
2024-02-05 17:38:02 -08:00
Philipp Moritz
d0d93b92b1
Add unit test for Mixtral MoE layer ( #2677 )
2024-01-31 14:34:17 -08:00
Philipp Moritz
ab40644669
Fused MOE for Mixtral ( #2542 )
...
Co-authored-by: chen shen <scv119@gmail.com>
2024-01-29 22:43:37 -08:00
Zhuohan Li
fd4ea8ef5c
Use NCCL instead of ray for control-plane communication to remove serialization overhead ( #2221 )
2024-01-03 11:30:22 -08:00
Woosuk Kwon
ba4f826738
[BugFix] Fix weight loading for Mixtral with TP ( #2208 )
2023-12-19 16:16:11 -08:00
Woosuk Kwon
37ca558103
Optimize model execution with CUDA graph ( #1926 )
...
Co-authored-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
2023-12-16 21:12:08 -08:00
Roy
eed74a558f
Simplify weight loading logic ( #2133 )
2023-12-16 12:41:23 -08:00
CHU Tianxiang
0fbfc4b81b
Add GPTQ support ( #916 )
2023-12-15 03:04:22 -08:00
Antoni Baum
21d93c140d
Optimize Mixtral with expert parallelism ( #2090 )
2023-12-13 23:55:07 -08:00
Woosuk Kwon
518369d78c
Implement lazy model loader ( #2044 )
2023-12-12 22:21:45 -08:00
Woosuk Kwon
cb3f30c600
Upgrade transformers version to 4.36.0 ( #2046 )
2023-12-11 18:39:14 -08:00
Woosuk Kwon
31d2ab4aff
Remove python 3.10 requirement ( #2040 )
2023-12-11 12:26:42 -08:00
Woosuk Kwon
6120e5aaea
Fix import error msg for megablocks ( #2038 )
2023-12-11 11:40:56 -08:00
Woosuk Kwon
81ce2a4b26
[Minor] Fix type annotation in Mixtral ( #2036 )
2023-12-11 11:32:39 -08:00
Woosuk Kwon
4ff0203987
Minor fixes for Mixtral ( #2015 )
2023-12-11 09:16:15 -08:00
Pierre Stock
b5f882cc98
Mixtral 8x7B support ( #2011 )
...
Co-authored-by: Pierre Stock <p@mistral.ai>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2023-12-11 01:09:15 -08:00