1188 Commits

Author SHA1 Message Date
Woosuk Kwon
c00ddd6834 Add buffer donation to benchmark 2024-04-30 21:58:47 +00:00
Woosuk Kwon
881b884046 Add block size 2024-04-27 22:35:28 +00:00
Woosuk Kwon
98a3df0f8d Disable memory tracking 2024-04-26 08:56:26 +00:00
Woosuk Kwon
3f6288cc89 Fix for binary cache 2024-04-26 08:56:12 +00:00
Woosuk Kwon
408ff4950c Tune pages_per_compute_block 2024-04-26 08:55:23 +00:00
Woosuk Kwon
278e8a1adc Add tpu 2024-04-26 08:54:52 +00:00
Woosuk Kwon
07be6ed3eb Improve benchmark 2024-04-26 08:54:41 +00:00
Woosuk Kwon
f6637dba18 Use persistent cache 2024-04-26 07:09:44 +00:00
Woosuk Kwon
707a5f6473 Move JAX-smi to worker 2024-04-26 07:05:51 +00:00
Woosuk Kwon
57690a9c09 Fix bucketing 2024-04-26 07:05:27 +00:00
Woosuk Kwon
b15db234ba Add precompilation step 2024-04-26 05:43:08 +00:00
Woosuk Kwon
d1591f0f1f Add op benchmark scripts 2024-04-26 05:35:19 +00:00
Woosuk Kwon
85d4488458 yapf 2024-04-26 05:31:31 +00:00
Woosuk Kwon
8d072dbfbd yapf 2024-04-26 05:30:25 +00:00
Woosuk Kwon
d830766c0c yapf 2024-04-26 05:30:08 +00:00
Woosuk Kwon
5ae2f81c2b Add warmup + formatting 2024-04-26 05:28:09 +00:00
Woosuk Kwon
4ea41d01a9 yapf 2024-04-26 05:27:38 +00:00
Woosuk Kwon
d16a348477 Add comment 2024-04-26 05:27:27 +00:00
Woosuk Kwon
aa092834bb Format gemma.py 2024-04-26 05:26:38 +00:00
Woosuk Kwon
d2c6a32c0c Fix is_tpu 2024-04-26 05:26:24 +00:00
Woosuk Kwon
21f35c2289 Change version 2024-04-26 05:00:26 +00:00
Woosuk Kwon
2aa9831dd3 Minor 2024-04-25 23:40:44 +00:00
Woosuk Kwon
028f528aad Fix KV cache shape 2024-04-25 23:38:07 +00:00
Woosuk Kwon
fa5bacd5b0 Add warmup 2024-04-25 05:06:41 +00:00
Woosuk Kwon
b62170e4e3 Fix scheduler 2024-04-25 05:06:22 +00:00
Woosuk Kwon
98eda57899 Add timer 2024-04-25 05:06:11 +00:00
Woosuk Kwon
81b8b813f1 Pad to avoid recompilation 2024-04-25 04:43:33 +00:00
Woosuk Kwon
e2c7dedb3a Minor 2024-04-25 03:28:53 +00:00
Woosuk Kwon
5323969fcf Increase #blocks 2024-04-24 08:56:58 +00:00
Woosuk Kwon
f42b4c27d8 Include argmax to jit 2024-04-24 08:56:45 +00:00
Woosuk Kwon
620e7646d3 Fix cache write 2024-04-24 08:56:30 +00:00
Woosuk Kwon
d5fb1c20c1 Fix JAX jit OOM 2024-04-24 07:52:56 +00:00
Woosuk Kwon
092e3d6d6d Remove hardcoded path 2024-04-19 08:18:10 +00:00
Woosuk Kwon
84284302d8 Minor 2024-04-19 08:08:25 +00:00
Woosuk Kwon
743695f586 Fix write_to_kv_cache 2024-04-19 07:51:54 +00:00
Woosuk Kwon
62b870fa07 Use FlashAttention kernel 2024-04-17 20:24:45 +00:00
Woosuk Kwon
7e3a230c38 Fix paged_attn 2024-04-17 20:06:26 +00:00
Woosuk Kwon
186c88c497 explictly return new_kv_caches 2024-04-17 18:42:34 +00:00
Woosuk Kwon
ef762cb110 Write kV 2024-04-17 18:21:39 +00:00
Woosuk Kwon
756c4e78d3 Add write_to_cache ops 2024-04-17 18:20:55 +00:00
Woosuk Kwon
4880de35d2 Add attn_mask 2024-04-17 18:12:20 +00:00
Woosuk Kwon
0fb07c08d0 Minor 2024-04-17 18:08:33 +00:00
Woosuk Kwon
e4377dd698 Add model runner 2024-04-17 18:04:54 +00:00
Woosuk Kwon
5cb213c85e Add flash-attn op 2024-04-17 18:02:28 +00:00
Woosuk Kwon
25bbc21ef6 Minor 2024-04-17 18:02:16 +00:00
Woosuk Kwon
b25fcc06c2 Minor 2024-04-17 18:02:13 +00:00
Woosuk Kwon
6661c030c4 Add paged_attn op 2024-04-17 18:02:00 +00:00
Woosuk Kwon
8888d1c474 Fix logit indices 2024-04-17 18:01:43 +00:00
Woosuk Kwon
cedb67028a Add gemma 2024-04-17 17:00:10 +00:00
Woosuk Kwon
91b47e3f2f JAX-based TPU worker 2024-04-16 17:37:11 +00:00