Allen Wang
|
c6cf9295e1
|
[Bugfix] Sets is_first_step_output for TPUModelRunner (#9202)
|
2024-10-11 13:28:10 -07:00 |
|
Tyler Michael Smith
|
7342a7d7f8
|
[Model] Support Mamba (#6484)
|
2024-10-11 15:40:06 +00:00 |
|
youkaichao
|
cbc2ef5529
|
[misc] hide best_of from engine (#9261)
Co-authored-by: Brendan Wong <bjwpokemon@gmail.com>
|
2024-10-10 21:30:44 -07:00 |
|
youkaichao
|
18b296fdb2
|
[core] remove beam search from the core (#9105)
|
2024-10-07 05:47:04 +00:00 |
|
youkaichao
|
a9b15c606f
|
[torch.compile] use empty tensor instead of None for profiling (#8875)
|
2024-09-27 08:11:32 -07:00 |
|
Woosuk Kwon
|
50e9ec41fc
|
[TPU] Implement multi-step scheduling (#8489)
|
2024-09-14 16:58:31 -07:00 |
|
youkaichao
|
ce2702a923
|
[tpu][misc] fix typo (#8260)
|
2024-09-06 22:40:46 -07:00 |
|
Woosuk Kwon
|
0af3abe3d3
|
[TPU][Bugfix] Fix next_token_ids shape (#8128)
|
2024-09-03 13:29:24 -07:00 |
|
Woosuk Kwon
|
80c7b089b1
|
[TPU] Async output processing for TPU (#8011)
|
2024-08-29 19:35:29 -07:00 |
|
afeldman-nm
|
428dd1445e
|
[Core] Logprobs support in Multi-step (#7652)
|
2024-08-29 19:19:08 -07:00 |
|
youkaichao
|
ce6bf3a2cf
|
[torch.compile] avoid Dynamo guard evaluation overhead (#7898)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-08-28 16:10:12 -07:00 |
|
Woosuk Kwon
|
43735bf5e1
|
[TPU] Remove redundant input tensor cloning (#7660)
|
2024-08-19 15:55:04 -07:00 |
|
Woosuk Kwon
|
0c2fa50b84
|
[TPU] Use mark_dynamic only for dummy run (#7634)
|
2024-08-18 00:18:53 -07:00 |
|
Roger Wang
|
bbf55c4805
|
[VLM] Refactor MultiModalConfig initialization and profiling (#7530)
|
2024-08-17 13:30:55 -07:00 |
|
Woosuk Kwon
|
90bab18f24
|
[TPU] Use mark_dynamic to reduce compilation time (#7340)
|
2024-08-10 18:12:22 -07:00 |
|
Woosuk Kwon
|
6e063ea35b
|
[TPU] Fix greedy decoding (#6933)
|
2024-07-30 02:06:29 -07:00 |
|
Woosuk Kwon
|
fad5576c58
|
[TPU] Reduce compilation time & Upgrade PyTorch XLA version (#6856)
|
2024-07-27 10:28:33 -07:00 |
|
Woosuk Kwon
|
52f07e3dec
|
[Hardware][TPU] Implement tensor parallelism with Ray (#5871)
|
2024-07-26 20:54:27 -07:00 |
|
Woosuk Kwon
|
4634c8728b
|
[TPU] Refactor TPU worker & model runner (#6506)
|
2024-07-18 01:34:16 -07:00 |
|
Woosuk Kwon
|
e09ce759aa
|
[TPU] Remove multi-modal args in TPU backend (#6504)
|
2024-07-17 04:02:53 -07:00 |
|
Woosuk Kwon
|
c467dff24f
|
[Hardware][TPU] Support MoE with Pallas GMM kernel (#6457)
|
2024-07-16 09:56:28 -07:00 |
|
Woosuk Kwon
|
5d5b4c5fe5
|
[Bugfix][TPU] Add missing None to model input (#6245)
|
2024-07-09 00:21:37 -07:00 |
|
xwjiang2010
|
d9e98f42e4
|
[vlm] Remove vision language config. (#6089)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-07-03 22:14:16 +00:00 |
|
Cyrus Leung
|
9831aec49f
|
[Core] Dynamic image size support for VLMs (#5276)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: ywang96 <ywang@roblox.com>
Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-07-02 20:34:00 -07:00 |
|
Woosuk Kwon
|
7f83f40dee
|
[Bugfix][TPU] Fix pad slot id (#5977)
|
2024-06-28 18:55:17 -07:00 |
|
Cody Yu
|
b2c620230a
|
[Spec Decode] Introduce DraftModelRunner (#5799)
|
2024-06-28 09:17:51 -07:00 |
|
Woosuk Kwon
|
cbc53b6b8d
|
[Hardware][TPU] Support parallel sampling & Swapping (#5855)
|
2024-06-26 11:07:49 -07:00 |
|
Woosuk Kwon
|
f178e56c68
|
[Hardware][TPU] Raise errors for unsupported sampling params (#5850)
|
2024-06-25 16:58:23 -07:00 |
|
Woosuk Kwon
|
bc34937d68
|
[Hardware][TPU] Refactor TPU backend (#5831)
|
2024-06-25 15:25:52 -07:00 |
|
Woosuk Kwon
|
1a8bfd92d5
|
[Hardware] Initial TPU integration (#5292)
|
2024-06-12 11:53:03 -07:00 |
|