Woosuk Kwon
|
2aa9831dd3
|
Minor
|
2024-04-25 23:40:44 +00:00 |
|
Woosuk Kwon
|
028f528aad
|
Fix KV cache shape
|
2024-04-25 23:38:07 +00:00 |
|
Woosuk Kwon
|
fa5bacd5b0
|
Add warmup
|
2024-04-25 05:06:41 +00:00 |
|
Woosuk Kwon
|
b62170e4e3
|
Fix scheduler
|
2024-04-25 05:06:22 +00:00 |
|
Woosuk Kwon
|
98eda57899
|
Add timer
|
2024-04-25 05:06:11 +00:00 |
|
Woosuk Kwon
|
81b8b813f1
|
Pad to avoid recompilation
|
2024-04-25 04:43:33 +00:00 |
|
Woosuk Kwon
|
e2c7dedb3a
|
Minor
|
2024-04-25 03:28:53 +00:00 |
|
Woosuk Kwon
|
5323969fcf
|
Increase #blocks
|
2024-04-24 08:56:58 +00:00 |
|
Woosuk Kwon
|
f42b4c27d8
|
Include argmax to jit
|
2024-04-24 08:56:45 +00:00 |
|
Woosuk Kwon
|
620e7646d3
|
Fix cache write
|
2024-04-24 08:56:30 +00:00 |
|
Woosuk Kwon
|
d5fb1c20c1
|
Fix JAX jit OOM
|
2024-04-24 07:52:56 +00:00 |
|
Woosuk Kwon
|
092e3d6d6d
|
Remove hardcoded path
|
2024-04-19 08:18:10 +00:00 |
|
Woosuk Kwon
|
84284302d8
|
Minor
|
2024-04-19 08:08:25 +00:00 |
|
Woosuk Kwon
|
743695f586
|
Fix write_to_kv_cache
|
2024-04-19 07:51:54 +00:00 |
|
Woosuk Kwon
|
62b870fa07
|
Use FlashAttention kernel
|
2024-04-17 20:24:45 +00:00 |
|
Woosuk Kwon
|
7e3a230c38
|
Fix paged_attn
|
2024-04-17 20:06:26 +00:00 |
|
Woosuk Kwon
|
186c88c497
|
explictly return new_kv_caches
|
2024-04-17 18:42:34 +00:00 |
|
Woosuk Kwon
|
ef762cb110
|
Write kV
|
2024-04-17 18:21:39 +00:00 |
|
Woosuk Kwon
|
756c4e78d3
|
Add write_to_cache ops
|
2024-04-17 18:20:55 +00:00 |
|
Woosuk Kwon
|
4880de35d2
|
Add attn_mask
|
2024-04-17 18:12:20 +00:00 |
|
Woosuk Kwon
|
0fb07c08d0
|
Minor
|
2024-04-17 18:08:33 +00:00 |
|
Woosuk Kwon
|
e4377dd698
|
Add model runner
|
2024-04-17 18:04:54 +00:00 |
|
Woosuk Kwon
|
5cb213c85e
|
Add flash-attn op
|
2024-04-17 18:02:28 +00:00 |
|
Woosuk Kwon
|
25bbc21ef6
|
Minor
|
2024-04-17 18:02:16 +00:00 |
|
Woosuk Kwon
|
b25fcc06c2
|
Minor
|
2024-04-17 18:02:13 +00:00 |
|
Woosuk Kwon
|
6661c030c4
|
Add paged_attn op
|
2024-04-17 18:02:00 +00:00 |
|
Woosuk Kwon
|
8888d1c474
|
Fix logit indices
|
2024-04-17 18:01:43 +00:00 |
|
Woosuk Kwon
|
cedb67028a
|
Add gemma
|
2024-04-17 17:00:10 +00:00 |
|
Woosuk Kwon
|
91b47e3f2f
|
JAX-based TPU worker
|
2024-04-16 17:37:11 +00:00 |
|
Woosuk Kwon
|
6d62e4c6aa
|
Add torch to dependencies
|
2024-04-16 17:06:35 +00:00 |
|
Woosuk Kwon
|
de82e95787
|
Minor
|
2024-04-16 17:04:46 +00:00 |
|
Woosuk Kwon
|
b3b89cf755
|
Renew TPU executor
|
2024-04-16 09:42:15 +00:00 |
|
Woosuk Kwon
|
6692a30266
|
Minor
|
2024-04-16 09:41:53 +00:00 |
|
Woosuk Kwon
|
eb0a0466a9
|
Add JAX requirements
|
2024-04-16 08:05:54 +00:00 |
|
Woosuk Kwon
|
c59c1e7b2c
|
Remove
|
2024-04-16 08:05:36 +00:00 |
|
Woosuk Kwon
|
d4adf92beb
|
Merge branch 'main' into woosuk-tpu
|
2024-04-16 07:56:53 +00:00 |
|
Noam Gat
|
05434764cd
|
LM Format Enforcer Guided Decoding Support (#3868)
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-04-16 05:54:57 +00:00 |
|
SangBin Cho
|
4e7ee664e2
|
[Core] Fix engine-use-ray broken (#4105)
|
2024-04-16 05:24:53 +00:00 |
|
SangBin Cho
|
37e84a403d
|
[Typing] Fix Sequence type GenericAlias only available after Python 3.9. (#4092)
|
2024-04-15 14:47:31 -07:00 |
|
Ricky Xu
|
4695397dcf
|
[Bugfix] Fix ray workers profiling with nsight (#4095)
|
2024-04-15 14:24:45 -07:00 |
|
Sanger Steel
|
d619ae2d19
|
[Doc] Add better clarity for tensorizer usage (#4090)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-04-15 13:28:25 -07:00 |
|
Nick Hill
|
eb46fbfda2
|
[Core] Simplifications to executor classes (#4071)
|
2024-04-15 13:05:09 -07:00 |
|
Li, Jiang
|
0003e9154b
|
[Misc][Minor] Fix CPU block num log in CPUExecutor. (#4088)
|
2024-04-15 08:35:55 -07:00 |
|
Zhuohan Li
|
e11e200736
|
[Bugfix] Fix filelock version requirement (#4075)
|
2024-04-14 21:50:08 -07:00 |
|
Roy
|
8db1bf32f8
|
[Misc] Upgrade triton to 2.2.0 (#4061)
|
2024-04-14 17:43:54 -07:00 |
|
Simon Mo
|
aceb17cf2d
|
[Docs] document that mixtral 8x22b is supported (#4073)
|
2024-04-14 14:35:55 -07:00 |
|
Nick Hill
|
563c54f760
|
[BugFix] Fix tensorizer extra in setup.py (#4072)
|
2024-04-14 14:12:42 -07:00 |
|
youkaichao
|
2cd6b4f362
|
[Core] avoid too many cuda context by caching p2p test (#4021)
|
2024-04-13 23:40:21 -07:00 |
|
Sanger Steel
|
711a000255
|
[Frontend] [Core] feat: Add model loading using tensorizer (#3476)
|
2024-04-13 17:13:01 -07:00 |
|
Jee Li
|
989ae2538d
|
[Kernel] Add punica dimension for Baichuan-13B (#4053)
|
2024-04-13 07:55:05 -07:00 |
|