xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-16 21:49:10 +08:00

Author	SHA1	Message	Date
bnellnm	f9c069c85e	Modularize fused experts and integrate PPLX kernels (#15956 )	2025-05-14 13:11:54 -07:00
Akshat Tripathi	c20ef40fd0	[Hardware][TPU][V1] Multi-LoRA implementation for the V1 TPU backend (#14238 ) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Chengji Yao <chengjiyao@google.com> Co-authored-by: Chengji Yao <chengjiyao@google.com>	2025-05-07 16:28:47 -04:00
idouba	72c5b97231	Update tpu_worker.py 's typo (#17288 )	2025-04-28 04:01:15 -07:00
Chengji Yao	e74ff409e0	[TPU] support disabling xla compilation cache (#15567 ) Signed-off-by: Chengji Yao <chengjiyao@google.com>	2025-03-27 00:09:28 +00:00
yarongmu-google	dd344e0342	[Bugfix] Fix torch_xla in V0 which can't handle None seed introduced … (#14844 ) Signed-off-by: Yarong Mu <ymu@google.com>	2025-03-15 00:41:15 +00:00
Siyuan Liu	beebf4742a	[TPU][Profiler] Support start_profile/stop_profile in TPU worker (#13988 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-03-04 14:40:06 -05:00
Jee Jee Li	5157338ed9	[Misc] Improve LoRA spelling (#13831 )	2025-02-25 23:43:01 -08:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
Chen Zhang	0f8cafe2d1	[Kernel] unified_attention for Attention.forward (#11967 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-01-13 19:28:53 +08:00
Woosuk Kwon	2f77b6cfec	[TPU] Implement prefix caching for TPUs (#10307 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-11-20 13:54:15 -08:00
youkaichao	e893795443	[2/N] executor pass the complete config to worker/modelrunner (#9938 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2024-11-02 07:35:05 -07:00
Woosuk Kwon	211fe91aa8	[TPU] Correctly profile peak memory usage & Upgrade PyTorch XLA (#9438 )	2024-10-30 09:41:38 +00:00
youkaichao	a9b15c606f	[torch.compile] use empty tensor instead of None for profiling (#8875 )	2024-09-27 08:11:32 -07:00
Woosuk Kwon	61f4a93d14	[TPU][Bugfix] Use XLA rank for persistent cache path (#8137 )	2024-09-03 18:35:33 -07:00
youkaichao	a7f65c2be9	[torch.compile] remove reset (#7975 )	2024-08-28 17:32:26 -07:00
youkaichao	64cc644425	[core][torch.compile] discard the compile for profiling (#7796 )	2024-08-26 21:33:58 -07:00
Woosuk Kwon	ce143353c6	[TPU] Skip creating empty tensor (#7630 )	2024-08-17 14:22:46 -07:00
Roger Wang	bbf55c4805	[VLM] Refactor `MultiModalConfig` initialization and profiling (#7530 )	2024-08-17 13:30:55 -07:00
Woosuk Kwon	951fdd66d3	[TPU] Set per-rank XLA cache (#7533 )	2024-08-14 14:47:51 -07:00
Cyrus Leung	4ddc4743d7	[Core] Consolidate `GB` constant and enable float GB arguments (#7416 )	2024-08-12 14:14:14 -07:00
Woosuk Kwon	533d1932d2	[Bugfix][TPU] Set readonly=True for non-root devices (#6980 )	2024-07-31 00:19:28 -07:00
Woosuk Kwon	fad5576c58	[TPU] Reduce compilation time & Upgrade PyTorch XLA version (#6856 )	2024-07-27 10:28:33 -07:00
Woosuk Kwon	52f07e3dec	[Hardware][TPU] Implement tensor parallelism with Ray (#5871 )	2024-07-26 20:54:27 -07:00
Woosuk Kwon	4634c8728b	[TPU] Refactor TPU worker & model runner (#6506 )	2024-07-18 01:34:16 -07:00
Cyrus Leung	d97011512e	[CI/Build] vLLM cache directory for images (#6444 )	2024-07-15 23:12:25 -07:00
youkaichao	3de6e6a30e	[core][distributed] support n layers % pp size != 0 (#6115 )	2024-07-03 16:40:31 -07:00
xwjiang2010	d9e98f42e4	[vlm] Remove vision language config. (#6089 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-03 22:14:16 +00:00
Woosuk Kwon	54814fd85b	[Bugfix][TPU] Fix TPU sampler output (#5978 )	2024-06-28 18:14:16 -07:00
Woosuk Kwon	f136da15e1	[Hardware][TPU] Optimize KV cache swapping (#5878 )	2024-06-27 21:12:13 -07:00
Woosuk Kwon	f5c8628fdc	[Bugfix][TPU] Fix CPU cache allocation (#5869 )	2024-06-26 13:42:40 -07:00
Woosuk Kwon	cbc53b6b8d	[Hardware][TPU] Support parallel sampling & Swapping (#5855 )	2024-06-26 11:07:49 -07:00
Woosuk Kwon	3439c5a8e3	[Bugfix][TPU] Fix KV cache size calculation (#5860 )	2024-06-26 00:58:23 -07:00
Woosuk Kwon	bc34937d68	[Hardware][TPU] Refactor TPU backend (#5831 )	2024-06-25 15:25:52 -07:00
Woosuk Kwon	1a8bfd92d5	[Hardware] Initial TPU integration (#5292 )	2024-06-12 11:53:03 -07:00

34 Commits