Woosuk Kwon
|
57690a9c09
|
Fix bucketing
|
2024-04-26 07:05:27 +00:00 |
|
Woosuk Kwon
|
b15db234ba
|
Add precompilation step
|
2024-04-26 05:43:08 +00:00 |
|
Woosuk Kwon
|
d1591f0f1f
|
Add op benchmark scripts
|
2024-04-26 05:35:19 +00:00 |
|
Woosuk Kwon
|
85d4488458
|
yapf
|
2024-04-26 05:31:31 +00:00 |
|
Woosuk Kwon
|
8d072dbfbd
|
yapf
|
2024-04-26 05:30:25 +00:00 |
|
Woosuk Kwon
|
d830766c0c
|
yapf
|
2024-04-26 05:30:08 +00:00 |
|
Woosuk Kwon
|
5ae2f81c2b
|
Add warmup + formatting
|
2024-04-26 05:28:09 +00:00 |
|
Woosuk Kwon
|
4ea41d01a9
|
yapf
|
2024-04-26 05:27:38 +00:00 |
|
Woosuk Kwon
|
d16a348477
|
Add comment
|
2024-04-26 05:27:27 +00:00 |
|
Woosuk Kwon
|
aa092834bb
|
Format gemma.py
|
2024-04-26 05:26:38 +00:00 |
|
Woosuk Kwon
|
d2c6a32c0c
|
Fix is_tpu
|
2024-04-26 05:26:24 +00:00 |
|
Woosuk Kwon
|
21f35c2289
|
Change version
|
2024-04-26 05:00:26 +00:00 |
|
Woosuk Kwon
|
2aa9831dd3
|
Minor
|
2024-04-25 23:40:44 +00:00 |
|
Woosuk Kwon
|
028f528aad
|
Fix KV cache shape
|
2024-04-25 23:38:07 +00:00 |
|
Woosuk Kwon
|
fa5bacd5b0
|
Add warmup
|
2024-04-25 05:06:41 +00:00 |
|
Woosuk Kwon
|
b62170e4e3
|
Fix scheduler
|
2024-04-25 05:06:22 +00:00 |
|
Woosuk Kwon
|
98eda57899
|
Add timer
|
2024-04-25 05:06:11 +00:00 |
|
Woosuk Kwon
|
81b8b813f1
|
Pad to avoid recompilation
|
2024-04-25 04:43:33 +00:00 |
|
Woosuk Kwon
|
e2c7dedb3a
|
Minor
|
2024-04-25 03:28:53 +00:00 |
|
Woosuk Kwon
|
5323969fcf
|
Increase #blocks
|
2024-04-24 08:56:58 +00:00 |
|
Woosuk Kwon
|
f42b4c27d8
|
Include argmax to jit
|
2024-04-24 08:56:45 +00:00 |
|
Woosuk Kwon
|
620e7646d3
|
Fix cache write
|
2024-04-24 08:56:30 +00:00 |
|
Woosuk Kwon
|
d5fb1c20c1
|
Fix JAX jit OOM
|
2024-04-24 07:52:56 +00:00 |
|
Woosuk Kwon
|
092e3d6d6d
|
Remove hardcoded path
|
2024-04-19 08:18:10 +00:00 |
|
Woosuk Kwon
|
84284302d8
|
Minor
|
2024-04-19 08:08:25 +00:00 |
|
Woosuk Kwon
|
743695f586
|
Fix write_to_kv_cache
|
2024-04-19 07:51:54 +00:00 |
|
Woosuk Kwon
|
62b870fa07
|
Use FlashAttention kernel
|
2024-04-17 20:24:45 +00:00 |
|
Woosuk Kwon
|
7e3a230c38
|
Fix paged_attn
|
2024-04-17 20:06:26 +00:00 |
|
Woosuk Kwon
|
186c88c497
|
explictly return new_kv_caches
|
2024-04-17 18:42:34 +00:00 |
|
Woosuk Kwon
|
ef762cb110
|
Write kV
|
2024-04-17 18:21:39 +00:00 |
|
Woosuk Kwon
|
756c4e78d3
|
Add write_to_cache ops
|
2024-04-17 18:20:55 +00:00 |
|
Woosuk Kwon
|
4880de35d2
|
Add attn_mask
|
2024-04-17 18:12:20 +00:00 |
|
Woosuk Kwon
|
0fb07c08d0
|
Minor
|
2024-04-17 18:08:33 +00:00 |
|
Woosuk Kwon
|
e4377dd698
|
Add model runner
|
2024-04-17 18:04:54 +00:00 |
|
Woosuk Kwon
|
5cb213c85e
|
Add flash-attn op
|
2024-04-17 18:02:28 +00:00 |
|
Woosuk Kwon
|
25bbc21ef6
|
Minor
|
2024-04-17 18:02:16 +00:00 |
|
Woosuk Kwon
|
b25fcc06c2
|
Minor
|
2024-04-17 18:02:13 +00:00 |
|
Woosuk Kwon
|
6661c030c4
|
Add paged_attn op
|
2024-04-17 18:02:00 +00:00 |
|
Woosuk Kwon
|
8888d1c474
|
Fix logit indices
|
2024-04-17 18:01:43 +00:00 |
|
Woosuk Kwon
|
cedb67028a
|
Add gemma
|
2024-04-17 17:00:10 +00:00 |
|
Woosuk Kwon
|
91b47e3f2f
|
JAX-based TPU worker
|
2024-04-16 17:37:11 +00:00 |
|
Woosuk Kwon
|
6d62e4c6aa
|
Add torch to dependencies
|
2024-04-16 17:06:35 +00:00 |
|
Woosuk Kwon
|
de82e95787
|
Minor
|
2024-04-16 17:04:46 +00:00 |
|
Woosuk Kwon
|
b3b89cf755
|
Renew TPU executor
|
2024-04-16 09:42:15 +00:00 |
|
Woosuk Kwon
|
6692a30266
|
Minor
|
2024-04-16 09:41:53 +00:00 |
|
Woosuk Kwon
|
eb0a0466a9
|
Add JAX requirements
|
2024-04-16 08:05:54 +00:00 |
|
Woosuk Kwon
|
c59c1e7b2c
|
Remove
|
2024-04-16 08:05:36 +00:00 |
|
Woosuk Kwon
|
d4adf92beb
|
Merge branch 'main' into woosuk-tpu
|
2024-04-16 07:56:53 +00:00 |
|
Noam Gat
|
05434764cd
|
LM Format Enforcer Guided Decoding Support (#3868)
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-04-16 05:54:57 +00:00 |
|
SangBin Cho
|
4e7ee664e2
|
[Core] Fix engine-use-ray broken (#4105)
|
2024-04-16 05:24:53 +00:00 |
|