1274 Commits

Author SHA1 Message Date
Woosuk Kwon
e67b4f2c2a
Use FP32 in RoPE initialization (#1004)
Co-authored-by: One <imone@tuta.io>
2023-09-11 00:26:35 -07:00
Antoni Baum
a62de9ecfd
Fix wrong dtype in PagedAttentionWithALiBi bias (#996)
---------

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>
2023-09-09 14:58:35 -07:00
Robert Irvine
4b5bcf8906
faster startup of vLLM (#982)
* update

---------

Co-authored-by: Robert Irvine <robert@seamlessml.com>
2023-09-08 14:48:54 +09:00
Woosuk Kwon
320a622ec4
[BugFix] Implement RoPE for GPT-J (#941) 2023-09-06 11:54:33 +09:00
Zhuohan Li
002800f081
Align vLLM's beam search implementation with HF generate (#857) 2023-09-04 17:29:42 -07:00
Dong-Yong Lee
e11222333f
fix: bug fix when penalties are negative (#913)
Co-authored-by: dongyong-lee <dongyong.lee@navercorp.com>
2023-09-01 00:37:17 +09:00
Aman Gupta Karmani
28873a2799
Improve _prune_hidden_states micro-benchmark (#707) 2023-08-31 13:28:43 +09:00
Aman Gupta Karmani
75471386de
use flash-attn via xformers (#877) 2023-08-29 21:52:13 -07:00
Woosuk Kwon
94d2f59895
Set replacement=True in torch.multinomial (#858) 2023-08-25 12:22:01 +09:00
Woosuk Kwon
2a4ec90854
Fix for breaking changes in xformers 0.0.21 (#834) 2023-08-23 17:44:21 +09:00
Woosuk Kwon
d64bf1646c
Implement approximate GELU kernels (#828) 2023-08-23 07:43:21 +09:00
Abraham-Xu
d1744376ae
Align with huggingface Top K sampling (#753) 2023-08-15 16:44:33 -07:00
Woosuk Kwon
55fe8a81ec
Refactor scheduler (#658) 2023-08-02 16:42:01 -07:00
Zhuohan Li
1b0bd0fe8a
Add Falcon support (new) (#592) 2023-08-02 14:04:39 -07:00
Zhuohan Li
6fc2a38b11
Add support for LLaMA-2 (#505) 2023-07-20 11:38:27 -07:00
Song
bda41c70dd
hotfix attn alibi wo head mapping (#496)
Co-authored-by: oliveryuan <oliveryuan@basemind.com>
2023-07-18 11:31:48 -07:00
Zhuohan Li
96853af5a8
Optimize MQA Kernel (#452) 2023-07-14 20:06:40 -04:00
Andre Slavescu
c894836108
[Model] Add support for GPT-J (#226)
Co-authored-by: woWoosuk Kwon <woosuk.kwon@berkeley.edu>
2023-07-08 17:55:16 -07:00
Woosuk Kwon
404422f42e
[Model] Add support for MPT (#334) 2023-07-03 16:47:53 -07:00
Woosuk Kwon
e41f06702c
Add support for BLOOM (#331) 2023-07-03 13:12:35 -07:00
Zhuohan Li
d6fa1be3a8
[Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00
Lily Liu
425040d4c1
remove floats == 0 comparison (#285) 2023-06-28 14:11:51 -07:00
Michael Feil
298695b766
GPTBigCode (StarCoder, SantaCoder Support) (#209) 2023-06-23 01:49:27 +08:00
Woosuk Kwon
0b98ba15c7
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00