5 Commits

Author SHA1 Message Date
Woosuk Kwon
b9926f7f66
Support block size 32 (#35) 2023-04-09 23:07:18 -07:00
Woosuk Kwon
ee88a7e5f3
Add an option to use dummy model weights (#33) 2023-04-08 23:36:12 -07:00
Woosuk Kwon
12659a0bd7
Add CUDA graph-based all reduce launcher (#26) 2023-04-05 11:16:57 -07:00
Woosuk Kwon
7a7929abe8
Implement preemption via recomputation & Refactor scheduling logic (#12) 2023-03-30 14:51:46 -07:00
Zhuohan Li
721fa3df15
FastAPI-based working frontend (#10) 2023-03-29 14:48:56 +08:00