1179 Commits

Author SHA1 Message Date
Woosuk Kwon
57690a9c09 Fix bucketing 2024-04-26 07:05:27 +00:00
Woosuk Kwon
b15db234ba Add precompilation step 2024-04-26 05:43:08 +00:00
Woosuk Kwon
d1591f0f1f Add op benchmark scripts 2024-04-26 05:35:19 +00:00
Woosuk Kwon
85d4488458 yapf 2024-04-26 05:31:31 +00:00
Woosuk Kwon
8d072dbfbd yapf 2024-04-26 05:30:25 +00:00
Woosuk Kwon
d830766c0c yapf 2024-04-26 05:30:08 +00:00
Woosuk Kwon
5ae2f81c2b Add warmup + formatting 2024-04-26 05:28:09 +00:00
Woosuk Kwon
4ea41d01a9 yapf 2024-04-26 05:27:38 +00:00
Woosuk Kwon
d16a348477 Add comment 2024-04-26 05:27:27 +00:00
Woosuk Kwon
aa092834bb Format gemma.py 2024-04-26 05:26:38 +00:00
Woosuk Kwon
d2c6a32c0c Fix is_tpu 2024-04-26 05:26:24 +00:00
Woosuk Kwon
21f35c2289 Change version 2024-04-26 05:00:26 +00:00
Woosuk Kwon
2aa9831dd3 Minor 2024-04-25 23:40:44 +00:00
Woosuk Kwon
028f528aad Fix KV cache shape 2024-04-25 23:38:07 +00:00
Woosuk Kwon
fa5bacd5b0 Add warmup 2024-04-25 05:06:41 +00:00
Woosuk Kwon
b62170e4e3 Fix scheduler 2024-04-25 05:06:22 +00:00
Woosuk Kwon
98eda57899 Add timer 2024-04-25 05:06:11 +00:00
Woosuk Kwon
81b8b813f1 Pad to avoid recompilation 2024-04-25 04:43:33 +00:00
Woosuk Kwon
e2c7dedb3a Minor 2024-04-25 03:28:53 +00:00
Woosuk Kwon
5323969fcf Increase #blocks 2024-04-24 08:56:58 +00:00
Woosuk Kwon
f42b4c27d8 Include argmax to jit 2024-04-24 08:56:45 +00:00
Woosuk Kwon
620e7646d3 Fix cache write 2024-04-24 08:56:30 +00:00
Woosuk Kwon
d5fb1c20c1 Fix JAX jit OOM 2024-04-24 07:52:56 +00:00
Woosuk Kwon
092e3d6d6d Remove hardcoded path 2024-04-19 08:18:10 +00:00
Woosuk Kwon
84284302d8 Minor 2024-04-19 08:08:25 +00:00
Woosuk Kwon
743695f586 Fix write_to_kv_cache 2024-04-19 07:51:54 +00:00
Woosuk Kwon
62b870fa07 Use FlashAttention kernel 2024-04-17 20:24:45 +00:00
Woosuk Kwon
7e3a230c38 Fix paged_attn 2024-04-17 20:06:26 +00:00
Woosuk Kwon
186c88c497 explictly return new_kv_caches 2024-04-17 18:42:34 +00:00
Woosuk Kwon
ef762cb110 Write kV 2024-04-17 18:21:39 +00:00
Woosuk Kwon
756c4e78d3 Add write_to_cache ops 2024-04-17 18:20:55 +00:00
Woosuk Kwon
4880de35d2 Add attn_mask 2024-04-17 18:12:20 +00:00
Woosuk Kwon
0fb07c08d0 Minor 2024-04-17 18:08:33 +00:00
Woosuk Kwon
e4377dd698 Add model runner 2024-04-17 18:04:54 +00:00
Woosuk Kwon
5cb213c85e Add flash-attn op 2024-04-17 18:02:28 +00:00
Woosuk Kwon
25bbc21ef6 Minor 2024-04-17 18:02:16 +00:00
Woosuk Kwon
b25fcc06c2 Minor 2024-04-17 18:02:13 +00:00
Woosuk Kwon
6661c030c4 Add paged_attn op 2024-04-17 18:02:00 +00:00
Woosuk Kwon
8888d1c474 Fix logit indices 2024-04-17 18:01:43 +00:00
Woosuk Kwon
cedb67028a Add gemma 2024-04-17 17:00:10 +00:00
Woosuk Kwon
91b47e3f2f JAX-based TPU worker 2024-04-16 17:37:11 +00:00
Woosuk Kwon
6d62e4c6aa Add torch to dependencies 2024-04-16 17:06:35 +00:00
Woosuk Kwon
de82e95787 Minor 2024-04-16 17:04:46 +00:00
Woosuk Kwon
b3b89cf755 Renew TPU executor 2024-04-16 09:42:15 +00:00
Woosuk Kwon
6692a30266 Minor 2024-04-16 09:41:53 +00:00
Woosuk Kwon
eb0a0466a9 Add JAX requirements 2024-04-16 08:05:54 +00:00
Woosuk Kwon
c59c1e7b2c Remove 2024-04-16 08:05:36 +00:00
Woosuk Kwon
d4adf92beb Merge branch 'main' into woosuk-tpu 2024-04-16 07:56:53 +00:00
Noam Gat
05434764cd
LM Format Enforcer Guided Decoding Support (#3868)
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-04-16 05:54:57 +00:00
SangBin Cho
4e7ee664e2
[Core] Fix engine-use-ray broken (#4105) 2024-04-16 05:24:53 +00:00