Woosuk Kwon
|
c00ddd6834
|
Add buffer donation to benchmark
|
2024-04-30 21:58:47 +00:00 |
|
Woosuk Kwon
|
881b884046
|
Add block size
|
2024-04-27 22:35:28 +00:00 |
|
Woosuk Kwon
|
98a3df0f8d
|
Disable memory tracking
|
2024-04-26 08:56:26 +00:00 |
|
Woosuk Kwon
|
3f6288cc89
|
Fix for binary cache
|
2024-04-26 08:56:12 +00:00 |
|
Woosuk Kwon
|
408ff4950c
|
Tune pages_per_compute_block
|
2024-04-26 08:55:23 +00:00 |
|
Woosuk Kwon
|
278e8a1adc
|
Add tpu
|
2024-04-26 08:54:52 +00:00 |
|
Woosuk Kwon
|
07be6ed3eb
|
Improve benchmark
|
2024-04-26 08:54:41 +00:00 |
|
Woosuk Kwon
|
f6637dba18
|
Use persistent cache
|
2024-04-26 07:09:44 +00:00 |
|
Woosuk Kwon
|
707a5f6473
|
Move JAX-smi to worker
|
2024-04-26 07:05:51 +00:00 |
|
Woosuk Kwon
|
57690a9c09
|
Fix bucketing
|
2024-04-26 07:05:27 +00:00 |
|
Woosuk Kwon
|
b15db234ba
|
Add precompilation step
|
2024-04-26 05:43:08 +00:00 |
|
Woosuk Kwon
|
d1591f0f1f
|
Add op benchmark scripts
|
2024-04-26 05:35:19 +00:00 |
|
Woosuk Kwon
|
85d4488458
|
yapf
|
2024-04-26 05:31:31 +00:00 |
|
Woosuk Kwon
|
8d072dbfbd
|
yapf
|
2024-04-26 05:30:25 +00:00 |
|
Woosuk Kwon
|
d830766c0c
|
yapf
|
2024-04-26 05:30:08 +00:00 |
|
Woosuk Kwon
|
5ae2f81c2b
|
Add warmup + formatting
|
2024-04-26 05:28:09 +00:00 |
|
Woosuk Kwon
|
4ea41d01a9
|
yapf
|
2024-04-26 05:27:38 +00:00 |
|
Woosuk Kwon
|
d16a348477
|
Add comment
|
2024-04-26 05:27:27 +00:00 |
|
Woosuk Kwon
|
aa092834bb
|
Format gemma.py
|
2024-04-26 05:26:38 +00:00 |
|
Woosuk Kwon
|
d2c6a32c0c
|
Fix is_tpu
|
2024-04-26 05:26:24 +00:00 |
|
Woosuk Kwon
|
21f35c2289
|
Change version
|
2024-04-26 05:00:26 +00:00 |
|
Woosuk Kwon
|
2aa9831dd3
|
Minor
|
2024-04-25 23:40:44 +00:00 |
|
Woosuk Kwon
|
028f528aad
|
Fix KV cache shape
|
2024-04-25 23:38:07 +00:00 |
|
Woosuk Kwon
|
fa5bacd5b0
|
Add warmup
|
2024-04-25 05:06:41 +00:00 |
|
Woosuk Kwon
|
b62170e4e3
|
Fix scheduler
|
2024-04-25 05:06:22 +00:00 |
|
Woosuk Kwon
|
98eda57899
|
Add timer
|
2024-04-25 05:06:11 +00:00 |
|
Woosuk Kwon
|
81b8b813f1
|
Pad to avoid recompilation
|
2024-04-25 04:43:33 +00:00 |
|
Woosuk Kwon
|
e2c7dedb3a
|
Minor
|
2024-04-25 03:28:53 +00:00 |
|
Woosuk Kwon
|
5323969fcf
|
Increase #blocks
|
2024-04-24 08:56:58 +00:00 |
|
Woosuk Kwon
|
f42b4c27d8
|
Include argmax to jit
|
2024-04-24 08:56:45 +00:00 |
|
Woosuk Kwon
|
620e7646d3
|
Fix cache write
|
2024-04-24 08:56:30 +00:00 |
|
Woosuk Kwon
|
d5fb1c20c1
|
Fix JAX jit OOM
|
2024-04-24 07:52:56 +00:00 |
|
Woosuk Kwon
|
092e3d6d6d
|
Remove hardcoded path
|
2024-04-19 08:18:10 +00:00 |
|
Woosuk Kwon
|
84284302d8
|
Minor
|
2024-04-19 08:08:25 +00:00 |
|
Woosuk Kwon
|
743695f586
|
Fix write_to_kv_cache
|
2024-04-19 07:51:54 +00:00 |
|
Woosuk Kwon
|
62b870fa07
|
Use FlashAttention kernel
|
2024-04-17 20:24:45 +00:00 |
|
Woosuk Kwon
|
7e3a230c38
|
Fix paged_attn
|
2024-04-17 20:06:26 +00:00 |
|
Woosuk Kwon
|
186c88c497
|
explictly return new_kv_caches
|
2024-04-17 18:42:34 +00:00 |
|
Woosuk Kwon
|
ef762cb110
|
Write kV
|
2024-04-17 18:21:39 +00:00 |
|
Woosuk Kwon
|
756c4e78d3
|
Add write_to_cache ops
|
2024-04-17 18:20:55 +00:00 |
|
Woosuk Kwon
|
4880de35d2
|
Add attn_mask
|
2024-04-17 18:12:20 +00:00 |
|
Woosuk Kwon
|
0fb07c08d0
|
Minor
|
2024-04-17 18:08:33 +00:00 |
|
Woosuk Kwon
|
e4377dd698
|
Add model runner
|
2024-04-17 18:04:54 +00:00 |
|
Woosuk Kwon
|
5cb213c85e
|
Add flash-attn op
|
2024-04-17 18:02:28 +00:00 |
|
Woosuk Kwon
|
25bbc21ef6
|
Minor
|
2024-04-17 18:02:16 +00:00 |
|
Woosuk Kwon
|
b25fcc06c2
|
Minor
|
2024-04-17 18:02:13 +00:00 |
|
Woosuk Kwon
|
6661c030c4
|
Add paged_attn op
|
2024-04-17 18:02:00 +00:00 |
|
Woosuk Kwon
|
8888d1c474
|
Fix logit indices
|
2024-04-17 18:01:43 +00:00 |
|
Woosuk Kwon
|
cedb67028a
|
Add gemma
|
2024-04-17 17:00:10 +00:00 |
|
Woosuk Kwon
|
91b47e3f2f
|
JAX-based TPU worker
|
2024-04-16 17:37:11 +00:00 |
|