15 Commits

Author SHA1 Message Date
chenqianfzh
2f4117c38e
support bitsandbytes quantization with more models (#9148) 2024-10-08 19:52:19 -06:00
Cyrus Leung
b22b798471
[Model] PP support for embedding models and update docs (#9090)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-10-06 16:35:27 +08:00
Xin Yang
15986f598c
[Model] Support Gemma2 embedding model (#9004) 2024-10-05 06:57:05 +00:00
Murali Andoorveedu
0f6d7a9a34
[Models] Add remaining model PP support (#7168)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-04 10:56:58 +08:00
Blueyo0
1bf2dd9df0
[Gemma2] add bitsandbytes support for Gemma2 (#8338) 2024-09-11 21:53:12 -07:00
afeldman-nm
428dd1445e
[Core] Logprobs support in Multi-step (#7652) 2024-08-29 19:19:08 -07:00
Zijian Hu
f4fc7337bf
[Bugfix] support tie_word_embeddings for all models (#5724) 2024-08-19 20:00:04 -07:00
Woosuk Kwon
df845b2b46
[Misc] Remove Gemma RoPE (#7638) 2024-08-19 09:29:31 -07:00
Cyrus Leung
7025b11d94
[Bugfix] Fix weight loading for Chameleon when TP>1 (#7410) 2024-08-13 05:33:41 +00:00
Woosuk Kwon
805a8a75f2
[Misc] Support attention logits soft-capping with flash-attn (#7022) 2024-08-01 13:14:37 -07:00
Michael Goin
f4fd390f5d
[Bugfix] Lower gemma's unloaded_params exception to warning (#7002) 2024-08-01 12:01:07 -07:00
Lily Liu
69ec3ca14c
[Kernel][Model] logits_soft_cap for Gemma2 with flashinfer (#6051)
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-07-04 16:35:51 -07:00
Qubitium-ModelCloud
ee93f4f92a
[CORE] Quantized lm-head Framework (#4442)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
Co-authored-by: ZX <zx@lbx.dev>
2024-07-02 22:25:17 +00:00
Murali Andoorveedu
c5832d2ae9
[Core] Pipeline Parallel Support (#4412)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
2024-07-02 10:58:08 -07:00
Woosuk Kwon
79c92c7c8a
[Model] Add Gemma 2 (#5908) 2024-06-27 13:33:56 -07:00