65 Commits

Author SHA1 Message Date
Cody Yu
a62aaf1df5
[Misc][Refactor] Generalize linear_method to be quant_method (#4373) 2024-04-26 16:41:14 -04:00
SangBin Cho
a88081bf76
[CI] Disable non-lazy string operation on logging (#4326)
Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>
2024-04-26 00:16:58 -07:00
Antoni Baum
69e1d2fb69
[Core] Refactor model loading code (#4097) 2024-04-16 11:34:39 -07:00
youkaichao
63e7176f26
[Core][Refactor] move parallel_utils into vllm/distributed (#3950)
[WIP][Core][Refactor] move vllm/model_executor/parallel_utils into vllm/distributed and vllm/device_communicators (#3950)
2024-04-10 15:33:30 -07:00
Woosuk Kwon
82c540bebf
[Bugfix] More faithful implementation of Gemma (#3653) 2024-03-27 09:37:18 -07:00
SangBin Cho
01bfb22b41
[CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
Woosuk Kwon
925f3332ca
[Core] Refactor Attention Take 2 (#3462) 2024-03-25 04:39:33 +00:00
Taemin Lee
b7050ca7df
[BugFix] gemma loading after quantization or LoRA. (#3553) 2024-03-21 13:16:57 -07:00
Roy
f1c0fc3919
Migrate logits computation and gather to model_runner (#3233) 2024-03-20 23:25:01 +00:00
Woosuk Kwon
2daf23ab0c
Separate attention backends (#3005) 2024-03-07 01:45:50 -08:00
TechxGenus
d3c04b6a39
Add GPTQ support for Gemma (#3200) 2024-03-07 08:19:14 +08:00
Woosuk Kwon
929b4f2973
Add LoRA support for Gemma (#3050) 2024-02-28 13:03:28 -08:00
Woosuk Kwon
fd5dcc5c81
Optimize GeGLU layer in Gemma (#2975) 2024-02-21 20:17:52 -08:00
Woosuk Kwon
95529e3253
Use Llama RMSNorm custom op for Gemma (#2974) 2024-02-21 18:28:23 -08:00
Xiang Xu
5253edaacb
Add Gemma model (#2964) 2024-02-21 09:34:30 -08:00