29 Commits

Author SHA1 Message Date
Woosuk Kwon
7c041ab578
Refactor system architecture (#82) 2023-05-09 15:30:12 -07:00
Woosuk Kwon
c9d5b6d4a8
Replace FlashAttention with xformers (#70) 2023-05-05 02:01:08 -07:00
Zhuohan Li
27f1410d06
New weight loader without np copy (#52) 2023-05-03 15:32:04 +08:00
Zhuohan Li
4858f3bb45
Add an option to launch cacheflow without ray (#51) 2023-04-30 15:42:17 +08:00
Woosuk Kwon
ee88a7e5f3
Add an option to use dummy model weights (#33) 2023-04-08 23:36:12 -07:00
Woosuk Kwon
0f40557af6
Implement block copy kernel to optimize beam search (#32) 2023-04-07 17:45:07 -07:00
Woosuk Kwon
12659a0bd7
Add CUDA graph-based all reduce launcher (#26) 2023-04-05 11:16:57 -07:00
Woosuk Kwon
897cb2ae28
Optimize data movement (#20) 2023-04-02 00:30:17 -07:00
Zhuohan Li
2f49f15585
Support tensor parallel (#2) 2023-03-21 13:45:42 -07:00
Woosuk Kwon
cfae35b861
Add miscellaneous updates (#8) 2023-03-13 13:48:38 -07:00
Woosuk Kwon
1a7eb7da61
Support beam search & parallel generation (#7) 2023-03-10 09:58:21 -08:00
Woosuk Kwon
0deacbce6e
Implement single_query_cached_kv_attention kernel (#3) 2023-03-01 15:02:19 -08:00
Woosuk Kwon
1ce1333573 Set default dtype to half 2023-02-23 21:31:39 +00:00
Woosuk Kwon
fdd0f2f472 Minor 2023-02-23 20:23:47 +00:00
Woosuk Kwon
1f6c7ef437 Add controller 2023-02-23 09:32:19 +00:00
Woosuk Kwon
343cea3dbc Add seq_ids to input metadata 2023-02-23 09:25:01 +00:00
Woosuk Kwon
4b1ac23f53 Fix slot mapping 2023-02-23 00:10:07 +00:00
Woosuk Kwon
8290fce47d Add Worker class 2023-02-22 19:01:38 +00:00
Woosuk Kwon
709a69176e Move worker/models -> models 2023-02-22 18:03:48 +00:00
Woosuk Kwon
6f058c7ba8 Implement cache ops 2023-02-16 07:47:03 +00:00
Woosuk Kwon
a1c67e6db8 Minor 2023-02-16 01:42:53 +00:00
Woosuk Kwon
9e68a6827e Fix return type error 2023-02-16 01:33:03 +00:00
Woosuk Kwon
8edcabc737 Add warning 2023-02-16 01:28:17 +00:00
Woosuk Kwon
2f4887de77 Fix KVCache shape 2023-02-16 01:24:45 +00:00
Woosuk Kwon
ee9442518d Fix get_model 2023-02-13 22:51:03 +00:00
Woosuk Kwon
fffa2e1f4b Add model_utils 2023-02-13 09:36:12 +00:00
Woosuk Kwon
bb59a3e730 Fix cache engine 2023-02-13 09:35:48 +00:00
Woosuk Kwon
e7bee2aa81 Add cache engine 2023-02-09 11:28:02 +00:00
Woosuk Kwon
39161c98a0 Add OPT 2023-02-09 11:25:37 +00:00