1167 Commits

Author SHA1 Message Date
Woosuk Kwon
2aa9831dd3 Minor 2024-04-25 23:40:44 +00:00
Woosuk Kwon
028f528aad Fix KV cache shape 2024-04-25 23:38:07 +00:00
Woosuk Kwon
fa5bacd5b0 Add warmup 2024-04-25 05:06:41 +00:00
Woosuk Kwon
b62170e4e3 Fix scheduler 2024-04-25 05:06:22 +00:00
Woosuk Kwon
98eda57899 Add timer 2024-04-25 05:06:11 +00:00
Woosuk Kwon
81b8b813f1 Pad to avoid recompilation 2024-04-25 04:43:33 +00:00
Woosuk Kwon
e2c7dedb3a Minor 2024-04-25 03:28:53 +00:00
Woosuk Kwon
5323969fcf Increase #blocks 2024-04-24 08:56:58 +00:00
Woosuk Kwon
f42b4c27d8 Include argmax to jit 2024-04-24 08:56:45 +00:00
Woosuk Kwon
620e7646d3 Fix cache write 2024-04-24 08:56:30 +00:00
Woosuk Kwon
d5fb1c20c1 Fix JAX jit OOM 2024-04-24 07:52:56 +00:00
Woosuk Kwon
092e3d6d6d Remove hardcoded path 2024-04-19 08:18:10 +00:00
Woosuk Kwon
84284302d8 Minor 2024-04-19 08:08:25 +00:00
Woosuk Kwon
743695f586 Fix write_to_kv_cache 2024-04-19 07:51:54 +00:00
Woosuk Kwon
62b870fa07 Use FlashAttention kernel 2024-04-17 20:24:45 +00:00
Woosuk Kwon
7e3a230c38 Fix paged_attn 2024-04-17 20:06:26 +00:00
Woosuk Kwon
186c88c497 explictly return new_kv_caches 2024-04-17 18:42:34 +00:00
Woosuk Kwon
ef762cb110 Write kV 2024-04-17 18:21:39 +00:00
Woosuk Kwon
756c4e78d3 Add write_to_cache ops 2024-04-17 18:20:55 +00:00
Woosuk Kwon
4880de35d2 Add attn_mask 2024-04-17 18:12:20 +00:00
Woosuk Kwon
0fb07c08d0 Minor 2024-04-17 18:08:33 +00:00
Woosuk Kwon
e4377dd698 Add model runner 2024-04-17 18:04:54 +00:00
Woosuk Kwon
5cb213c85e Add flash-attn op 2024-04-17 18:02:28 +00:00
Woosuk Kwon
25bbc21ef6 Minor 2024-04-17 18:02:16 +00:00
Woosuk Kwon
b25fcc06c2 Minor 2024-04-17 18:02:13 +00:00
Woosuk Kwon
6661c030c4 Add paged_attn op 2024-04-17 18:02:00 +00:00
Woosuk Kwon
8888d1c474 Fix logit indices 2024-04-17 18:01:43 +00:00
Woosuk Kwon
cedb67028a Add gemma 2024-04-17 17:00:10 +00:00
Woosuk Kwon
91b47e3f2f JAX-based TPU worker 2024-04-16 17:37:11 +00:00
Woosuk Kwon
6d62e4c6aa Add torch to dependencies 2024-04-16 17:06:35 +00:00
Woosuk Kwon
de82e95787 Minor 2024-04-16 17:04:46 +00:00
Woosuk Kwon
b3b89cf755 Renew TPU executor 2024-04-16 09:42:15 +00:00
Woosuk Kwon
6692a30266 Minor 2024-04-16 09:41:53 +00:00
Woosuk Kwon
eb0a0466a9 Add JAX requirements 2024-04-16 08:05:54 +00:00
Woosuk Kwon
c59c1e7b2c Remove 2024-04-16 08:05:36 +00:00
Woosuk Kwon
d4adf92beb Merge branch 'main' into woosuk-tpu 2024-04-16 07:56:53 +00:00
Noam Gat
05434764cd
LM Format Enforcer Guided Decoding Support (#3868)
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-04-16 05:54:57 +00:00
SangBin Cho
4e7ee664e2
[Core] Fix engine-use-ray broken (#4105) 2024-04-16 05:24:53 +00:00
SangBin Cho
37e84a403d
[Typing] Fix Sequence type GenericAlias only available after Python 3.9. (#4092) 2024-04-15 14:47:31 -07:00
Ricky Xu
4695397dcf
[Bugfix] Fix ray workers profiling with nsight (#4095) 2024-04-15 14:24:45 -07:00
Sanger Steel
d619ae2d19
[Doc] Add better clarity for tensorizer usage (#4090)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-04-15 13:28:25 -07:00
Nick Hill
eb46fbfda2
[Core] Simplifications to executor classes (#4071) 2024-04-15 13:05:09 -07:00
Li, Jiang
0003e9154b
[Misc][Minor] Fix CPU block num log in CPUExecutor. (#4088) 2024-04-15 08:35:55 -07:00
Zhuohan Li
e11e200736
[Bugfix] Fix filelock version requirement (#4075) 2024-04-14 21:50:08 -07:00
Roy
8db1bf32f8
[Misc] Upgrade triton to 2.2.0 (#4061) 2024-04-14 17:43:54 -07:00
Simon Mo
aceb17cf2d
[Docs] document that mixtral 8x22b is supported (#4073) 2024-04-14 14:35:55 -07:00
Nick Hill
563c54f760
[BugFix] Fix tensorizer extra in setup.py (#4072) 2024-04-14 14:12:42 -07:00
youkaichao
2cd6b4f362
[Core] avoid too many cuda context by caching p2p test (#4021) 2024-04-13 23:40:21 -07:00
Sanger Steel
711a000255
[Frontend] [Core] feat: Add model loading using tensorizer (#3476) 2024-04-13 17:13:01 -07:00
Jee Li
989ae2538d
[Kernel] Add punica dimension for Baichuan-13B (#4053) 2024-04-13 07:55:05 -07:00