25 Commits

Author SHA1 Message Date
Woosuk Kwon
2f77b6cfec
[TPU] Implement prefix caching for TPUs (#10307)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2024-11-20 13:54:15 -08:00
youkaichao
e893795443
[2/N] executor pass the complete config to worker/modelrunner (#9938)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2024-11-02 07:35:05 -07:00
Woosuk Kwon
211fe91aa8
[TPU] Correctly profile peak memory usage & Upgrade PyTorch XLA (#9438) 2024-10-30 09:41:38 +00:00
youkaichao
a9b15c606f
[torch.compile] use empty tensor instead of None for profiling (#8875) 2024-09-27 08:11:32 -07:00
Woosuk Kwon
61f4a93d14
[TPU][Bugfix] Use XLA rank for persistent cache path (#8137) 2024-09-03 18:35:33 -07:00
youkaichao
a7f65c2be9
[torch.compile] remove reset (#7975) 2024-08-28 17:32:26 -07:00
youkaichao
64cc644425
[core][torch.compile] discard the compile for profiling (#7796) 2024-08-26 21:33:58 -07:00
Woosuk Kwon
ce143353c6
[TPU] Skip creating empty tensor (#7630) 2024-08-17 14:22:46 -07:00
Roger Wang
bbf55c4805
[VLM] Refactor MultiModalConfig initialization and profiling (#7530) 2024-08-17 13:30:55 -07:00
Woosuk Kwon
951fdd66d3
[TPU] Set per-rank XLA cache (#7533) 2024-08-14 14:47:51 -07:00
Cyrus Leung
4ddc4743d7
[Core] Consolidate GB constant and enable float GB arguments (#7416) 2024-08-12 14:14:14 -07:00
Woosuk Kwon
533d1932d2
[Bugfix][TPU] Set readonly=True for non-root devices (#6980) 2024-07-31 00:19:28 -07:00
Woosuk Kwon
fad5576c58
[TPU] Reduce compilation time & Upgrade PyTorch XLA version (#6856) 2024-07-27 10:28:33 -07:00
Woosuk Kwon
52f07e3dec
[Hardware][TPU] Implement tensor parallelism with Ray (#5871) 2024-07-26 20:54:27 -07:00
Woosuk Kwon
4634c8728b
[TPU] Refactor TPU worker & model runner (#6506) 2024-07-18 01:34:16 -07:00
Cyrus Leung
d97011512e
[CI/Build] vLLM cache directory for images (#6444) 2024-07-15 23:12:25 -07:00
youkaichao
3de6e6a30e
[core][distributed] support n layers % pp size != 0 (#6115) 2024-07-03 16:40:31 -07:00
xwjiang2010
d9e98f42e4
[vlm] Remove vision language config. (#6089)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-07-03 22:14:16 +00:00
Woosuk Kwon
54814fd85b
[Bugfix][TPU] Fix TPU sampler output (#5978) 2024-06-28 18:14:16 -07:00
Woosuk Kwon
f136da15e1
[Hardware][TPU] Optimize KV cache swapping (#5878) 2024-06-27 21:12:13 -07:00
Woosuk Kwon
f5c8628fdc
[Bugfix][TPU] Fix CPU cache allocation (#5869) 2024-06-26 13:42:40 -07:00
Woosuk Kwon
cbc53b6b8d
[Hardware][TPU] Support parallel sampling & Swapping (#5855) 2024-06-26 11:07:49 -07:00
Woosuk Kwon
3439c5a8e3
[Bugfix][TPU] Fix KV cache size calculation (#5860) 2024-06-26 00:58:23 -07:00
Woosuk Kwon
bc34937d68
[Hardware][TPU] Refactor TPU backend (#5831) 2024-06-25 15:25:52 -07:00
Woosuk Kwon
1a8bfd92d5
[Hardware] Initial TPU integration (#5292) 2024-06-12 11:53:03 -07:00