xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-17 04:45:01 +08:00

Author	SHA1	Message	Date
Woosuk Kwon	2f77b6cfec	[TPU] Implement prefix caching for TPUs (#10307 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-11-20 13:54:15 -08:00
youkaichao	e893795443	[2/N] executor pass the complete config to worker/modelrunner (#9938 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2024-11-02 07:35:05 -07:00
Woosuk Kwon	211fe91aa8	[TPU] Correctly profile peak memory usage & Upgrade PyTorch XLA (#9438 )	2024-10-30 09:41:38 +00:00
youkaichao	a9b15c606f	[torch.compile] use empty tensor instead of None for profiling (#8875 )	2024-09-27 08:11:32 -07:00
Woosuk Kwon	61f4a93d14	[TPU][Bugfix] Use XLA rank for persistent cache path (#8137 )	2024-09-03 18:35:33 -07:00
youkaichao	a7f65c2be9	[torch.compile] remove reset (#7975 )	2024-08-28 17:32:26 -07:00
youkaichao	64cc644425	[core][torch.compile] discard the compile for profiling (#7796 )	2024-08-26 21:33:58 -07:00
Woosuk Kwon	ce143353c6	[TPU] Skip creating empty tensor (#7630 )	2024-08-17 14:22:46 -07:00
Roger Wang	bbf55c4805	[VLM] Refactor `MultiModalConfig` initialization and profiling (#7530 )	2024-08-17 13:30:55 -07:00
Woosuk Kwon	951fdd66d3	[TPU] Set per-rank XLA cache (#7533 )	2024-08-14 14:47:51 -07:00
Cyrus Leung	4ddc4743d7	[Core] Consolidate `GB` constant and enable float GB arguments (#7416 )	2024-08-12 14:14:14 -07:00
Woosuk Kwon	533d1932d2	[Bugfix][TPU] Set readonly=True for non-root devices (#6980 )	2024-07-31 00:19:28 -07:00
Woosuk Kwon	fad5576c58	[TPU] Reduce compilation time & Upgrade PyTorch XLA version (#6856 )	2024-07-27 10:28:33 -07:00
Woosuk Kwon	52f07e3dec	[Hardware][TPU] Implement tensor parallelism with Ray (#5871 )	2024-07-26 20:54:27 -07:00
Woosuk Kwon	4634c8728b	[TPU] Refactor TPU worker & model runner (#6506 )	2024-07-18 01:34:16 -07:00
Cyrus Leung	d97011512e	[CI/Build] vLLM cache directory for images (#6444 )	2024-07-15 23:12:25 -07:00
youkaichao	3de6e6a30e	[core][distributed] support n layers % pp size != 0 (#6115 )	2024-07-03 16:40:31 -07:00
xwjiang2010	d9e98f42e4	[vlm] Remove vision language config. (#6089 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-03 22:14:16 +00:00
Woosuk Kwon	54814fd85b	[Bugfix][TPU] Fix TPU sampler output (#5978 )	2024-06-28 18:14:16 -07:00
Woosuk Kwon	f136da15e1	[Hardware][TPU] Optimize KV cache swapping (#5878 )	2024-06-27 21:12:13 -07:00
Woosuk Kwon	f5c8628fdc	[Bugfix][TPU] Fix CPU cache allocation (#5869 )	2024-06-26 13:42:40 -07:00
Woosuk Kwon	cbc53b6b8d	[Hardware][TPU] Support parallel sampling & Swapping (#5855 )	2024-06-26 11:07:49 -07:00
Woosuk Kwon	3439c5a8e3	[Bugfix][TPU] Fix KV cache size calculation (#5860 )	2024-06-26 00:58:23 -07:00
Woosuk Kwon	bc34937d68	[Hardware][TPU] Refactor TPU backend (#5831 )	2024-06-25 15:25:52 -07:00
Woosuk Kwon	1a8bfd92d5	[Hardware] Initial TPU integration (#5292 )	2024-06-12 11:53:03 -07:00

25 Commits