Woosuk Kwon
|
2f77b6cfec
|
[TPU] Implement prefix caching for TPUs (#10307)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-20 13:54:15 -08:00 |
|
youkaichao
|
e893795443
|
[2/N] executor pass the complete config to worker/modelrunner (#9938)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2024-11-02 07:35:05 -07:00 |
|
Woosuk Kwon
|
211fe91aa8
|
[TPU] Correctly profile peak memory usage & Upgrade PyTorch XLA (#9438)
|
2024-10-30 09:41:38 +00:00 |
|
youkaichao
|
a9b15c606f
|
[torch.compile] use empty tensor instead of None for profiling (#8875)
|
2024-09-27 08:11:32 -07:00 |
|
Woosuk Kwon
|
61f4a93d14
|
[TPU][Bugfix] Use XLA rank for persistent cache path (#8137)
|
2024-09-03 18:35:33 -07:00 |
|
youkaichao
|
a7f65c2be9
|
[torch.compile] remove reset (#7975)
|
2024-08-28 17:32:26 -07:00 |
|
youkaichao
|
64cc644425
|
[core][torch.compile] discard the compile for profiling (#7796)
|
2024-08-26 21:33:58 -07:00 |
|
Woosuk Kwon
|
ce143353c6
|
[TPU] Skip creating empty tensor (#7630)
|
2024-08-17 14:22:46 -07:00 |
|
Roger Wang
|
bbf55c4805
|
[VLM] Refactor MultiModalConfig initialization and profiling (#7530)
|
2024-08-17 13:30:55 -07:00 |
|
Woosuk Kwon
|
951fdd66d3
|
[TPU] Set per-rank XLA cache (#7533)
|
2024-08-14 14:47:51 -07:00 |
|
Cyrus Leung
|
4ddc4743d7
|
[Core] Consolidate GB constant and enable float GB arguments (#7416)
|
2024-08-12 14:14:14 -07:00 |
|
Woosuk Kwon
|
533d1932d2
|
[Bugfix][TPU] Set readonly=True for non-root devices (#6980)
|
2024-07-31 00:19:28 -07:00 |
|
Woosuk Kwon
|
fad5576c58
|
[TPU] Reduce compilation time & Upgrade PyTorch XLA version (#6856)
|
2024-07-27 10:28:33 -07:00 |
|
Woosuk Kwon
|
52f07e3dec
|
[Hardware][TPU] Implement tensor parallelism with Ray (#5871)
|
2024-07-26 20:54:27 -07:00 |
|
Woosuk Kwon
|
4634c8728b
|
[TPU] Refactor TPU worker & model runner (#6506)
|
2024-07-18 01:34:16 -07:00 |
|
Cyrus Leung
|
d97011512e
|
[CI/Build] vLLM cache directory for images (#6444)
|
2024-07-15 23:12:25 -07:00 |
|
youkaichao
|
3de6e6a30e
|
[core][distributed] support n layers % pp size != 0 (#6115)
|
2024-07-03 16:40:31 -07:00 |
|
xwjiang2010
|
d9e98f42e4
|
[vlm] Remove vision language config. (#6089)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-07-03 22:14:16 +00:00 |
|
Woosuk Kwon
|
54814fd85b
|
[Bugfix][TPU] Fix TPU sampler output (#5978)
|
2024-06-28 18:14:16 -07:00 |
|
Woosuk Kwon
|
f136da15e1
|
[Hardware][TPU] Optimize KV cache swapping (#5878)
|
2024-06-27 21:12:13 -07:00 |
|
Woosuk Kwon
|
f5c8628fdc
|
[Bugfix][TPU] Fix CPU cache allocation (#5869)
|
2024-06-26 13:42:40 -07:00 |
|
Woosuk Kwon
|
cbc53b6b8d
|
[Hardware][TPU] Support parallel sampling & Swapping (#5855)
|
2024-06-26 11:07:49 -07:00 |
|
Woosuk Kwon
|
3439c5a8e3
|
[Bugfix][TPU] Fix KV cache size calculation (#5860)
|
2024-06-26 00:58:23 -07:00 |
|
Woosuk Kwon
|
bc34937d68
|
[Hardware][TPU] Refactor TPU backend (#5831)
|
2024-06-25 15:25:52 -07:00 |
|
Woosuk Kwon
|
1a8bfd92d5
|
[Hardware] Initial TPU integration (#5292)
|
2024-06-12 11:53:03 -07:00 |
|