66 Commits

Author SHA1 Message Date
Antoni Baum
15f5632365
Delay GPU->CPU sync in sampling (#1337) 2023-10-30 09:01:34 -07:00
Woosuk Kwon
c1376e0f82
Change scheduler & input tensor shape (#1381) 2023-10-16 17:48:42 -07:00
Antoni Baum
ee92b58b3a
Move bfloat16 check to worker (#1259) 2023-10-07 22:10:44 -07:00
Woosuk Kwon
2e8e49fce3
[Fix] Remove false assertion (#1222) 2023-09-28 10:52:38 -07:00
Woosuk Kwon
a8e98aee0c
Fix Mistral model (#1220) 2023-09-28 10:44:05 -07:00
Chris Bamford
bb1ba58f06
[Mistral] Mistral-7B-v0.1 support (#1196)
Co-authored-by: timlacroix <t@mistral.ai>
2023-09-28 10:41:03 -07:00
Antoni Baum
cf5cb1e33e
Allocate more shared memory to attention kernel (#1154) 2023-09-26 22:27:13 -07:00
Woosuk Kwon
9f6be8692e
Fix config for Falcon (#1164) 2023-09-23 17:38:43 -07:00
Woosuk Kwon
2ac4d5e2bf
Replace DtypeTensor (#1123) 2023-09-21 00:51:47 -07:00
Zhuohan Li
002800f081
Align vLLM's beam search implementation with HF generate (#857) 2023-09-04 17:29:42 -07:00
Zhuohan Li
1b0bd0fe8a
Add Falcon support (new) (#592) 2023-08-02 14:04:39 -07:00
Antoni Baum
9925c17940
Ray placement group support (#397) 2023-07-19 22:49:31 -07:00
Zhuohan Li
d6fa1be3a8
[Quality] Add code formatter and linter (#326) 2023-07-03 11:31:55 -07:00
Zhuohan Li
85de093472
[Fix] Do not pin memory when in WSL (#312) 2023-06-29 15:00:21 -07:00
Zhuohan Li
0b7db411b5
[Bug] Fix the OOM condition for CPU cache (#260) 2023-06-26 11:16:13 -07:00
Woosuk Kwon
0b98ba15c7
Change the name to vLLM (#150) 2023-06-17 03:07:40 -07:00