30 Commits

Author SHA1 Message Date
Allen Wang
c6cf9295e1
[Bugfix] Sets is_first_step_output for TPUModelRunner (#9202) 2024-10-11 13:28:10 -07:00
Tyler Michael Smith
7342a7d7f8
[Model] Support Mamba (#6484) 2024-10-11 15:40:06 +00:00
youkaichao
cbc2ef5529
[misc] hide best_of from engine (#9261)
Co-authored-by: Brendan Wong <bjwpokemon@gmail.com>
2024-10-10 21:30:44 -07:00
youkaichao
18b296fdb2
[core] remove beam search from the core (#9105) 2024-10-07 05:47:04 +00:00
youkaichao
a9b15c606f
[torch.compile] use empty tensor instead of None for profiling (#8875) 2024-09-27 08:11:32 -07:00
Woosuk Kwon
50e9ec41fc
[TPU] Implement multi-step scheduling (#8489) 2024-09-14 16:58:31 -07:00
youkaichao
ce2702a923
[tpu][misc] fix typo (#8260) 2024-09-06 22:40:46 -07:00
Woosuk Kwon
0af3abe3d3
[TPU][Bugfix] Fix next_token_ids shape (#8128) 2024-09-03 13:29:24 -07:00
Woosuk Kwon
80c7b089b1
[TPU] Async output processing for TPU (#8011) 2024-08-29 19:35:29 -07:00
afeldman-nm
428dd1445e
[Core] Logprobs support in Multi-step (#7652) 2024-08-29 19:19:08 -07:00
youkaichao
ce6bf3a2cf
[torch.compile] avoid Dynamo guard evaluation overhead (#7898)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-08-28 16:10:12 -07:00
Woosuk Kwon
43735bf5e1
[TPU] Remove redundant input tensor cloning (#7660) 2024-08-19 15:55:04 -07:00
Woosuk Kwon
0c2fa50b84
[TPU] Use mark_dynamic only for dummy run (#7634) 2024-08-18 00:18:53 -07:00
Roger Wang
bbf55c4805
[VLM] Refactor MultiModalConfig initialization and profiling (#7530) 2024-08-17 13:30:55 -07:00
Woosuk Kwon
90bab18f24
[TPU] Use mark_dynamic to reduce compilation time (#7340) 2024-08-10 18:12:22 -07:00
Woosuk Kwon
6e063ea35b
[TPU] Fix greedy decoding (#6933) 2024-07-30 02:06:29 -07:00
Woosuk Kwon
fad5576c58
[TPU] Reduce compilation time & Upgrade PyTorch XLA version (#6856) 2024-07-27 10:28:33 -07:00
Woosuk Kwon
52f07e3dec
[Hardware][TPU] Implement tensor parallelism with Ray (#5871) 2024-07-26 20:54:27 -07:00
Woosuk Kwon
4634c8728b
[TPU] Refactor TPU worker & model runner (#6506) 2024-07-18 01:34:16 -07:00
Woosuk Kwon
e09ce759aa
[TPU] Remove multi-modal args in TPU backend (#6504) 2024-07-17 04:02:53 -07:00
Woosuk Kwon
c467dff24f
[Hardware][TPU] Support MoE with Pallas GMM kernel (#6457) 2024-07-16 09:56:28 -07:00
Woosuk Kwon
5d5b4c5fe5
[Bugfix][TPU] Add missing None to model input (#6245) 2024-07-09 00:21:37 -07:00
xwjiang2010
d9e98f42e4
[vlm] Remove vision language config. (#6089)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-03 22:14:16 +00:00
Cyrus Leung
9831aec49f
[Core] Dynamic image size support for VLMs (#5276)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: ywang96 <ywang@roblox.com>
Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2024-07-02 20:34:00 -07:00
Woosuk Kwon
7f83f40dee
[Bugfix][TPU] Fix pad slot id (#5977) 2024-06-28 18:55:17 -07:00
Cody Yu
b2c620230a
[Spec Decode] Introduce DraftModelRunner (#5799) 2024-06-28 09:17:51 -07:00
Woosuk Kwon
cbc53b6b8d
[Hardware][TPU] Support parallel sampling & Swapping (#5855) 2024-06-26 11:07:49 -07:00
Woosuk Kwon
f178e56c68
[Hardware][TPU] Raise errors for unsupported sampling params (#5850) 2024-06-25 16:58:23 -07:00
Woosuk Kwon
bc34937d68
[Hardware][TPU] Refactor TPU backend (#5831) 2024-06-25 15:25:52 -07:00
Woosuk Kwon
1a8bfd92d5
[Hardware] Initial TPU integration (#5292) 2024-06-12 11:53:03 -07:00