xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-04-12 17:17:04 +08:00

Author	SHA1	Message	Date
Allen Wang	c6cf9295e1	[Bugfix] Sets `is_first_step_output` for TPUModelRunner (#9202 )	2024-10-11 13:28:10 -07:00
Tyler Michael Smith	7342a7d7f8	[Model] Support Mamba (#6484 )	2024-10-11 15:40:06 +00:00
youkaichao	cbc2ef5529	[misc] hide best_of from engine (#9261 ) Co-authored-by: Brendan Wong <bjwpokemon@gmail.com>	2024-10-10 21:30:44 -07:00
youkaichao	18b296fdb2	[core] remove beam search from the core (#9105 )	2024-10-07 05:47:04 +00:00
youkaichao	a9b15c606f	[torch.compile] use empty tensor instead of None for profiling (#8875 )	2024-09-27 08:11:32 -07:00
Woosuk Kwon	50e9ec41fc	[TPU] Implement multi-step scheduling (#8489 )	2024-09-14 16:58:31 -07:00
youkaichao	ce2702a923	[tpu][misc] fix typo (#8260 )	2024-09-06 22:40:46 -07:00
Woosuk Kwon	0af3abe3d3	[TPU][Bugfix] Fix next_token_ids shape (#8128 )	2024-09-03 13:29:24 -07:00
Woosuk Kwon	80c7b089b1	[TPU] Async output processing for TPU (#8011 )	2024-08-29 19:35:29 -07:00
afeldman-nm	428dd1445e	[Core] Logprobs support in Multi-step (#7652 )	2024-08-29 19:19:08 -07:00
youkaichao	ce6bf3a2cf	[torch.compile] avoid Dynamo guard evaluation overhead (#7898 ) Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-08-28 16:10:12 -07:00
Woosuk Kwon	43735bf5e1	[TPU] Remove redundant input tensor cloning (#7660 )	2024-08-19 15:55:04 -07:00
Woosuk Kwon	0c2fa50b84	[TPU] Use mark_dynamic only for dummy run (#7634 )	2024-08-18 00:18:53 -07:00
Roger Wang	bbf55c4805	[VLM] Refactor `MultiModalConfig` initialization and profiling (#7530 )	2024-08-17 13:30:55 -07:00
Woosuk Kwon	90bab18f24	[TPU] Use mark_dynamic to reduce compilation time (#7340 )	2024-08-10 18:12:22 -07:00
Woosuk Kwon	6e063ea35b	[TPU] Fix greedy decoding (#6933 )	2024-07-30 02:06:29 -07:00
Woosuk Kwon	fad5576c58	[TPU] Reduce compilation time & Upgrade PyTorch XLA version (#6856 )	2024-07-27 10:28:33 -07:00
Woosuk Kwon	52f07e3dec	[Hardware][TPU] Implement tensor parallelism with Ray (#5871 )	2024-07-26 20:54:27 -07:00
Woosuk Kwon	4634c8728b	[TPU] Refactor TPU worker & model runner (#6506 )	2024-07-18 01:34:16 -07:00
Woosuk Kwon	e09ce759aa	[TPU] Remove multi-modal args in TPU backend (#6504 )	2024-07-17 04:02:53 -07:00
Woosuk Kwon	c467dff24f	[Hardware][TPU] Support MoE with Pallas GMM kernel (#6457 )	2024-07-16 09:56:28 -07:00
Woosuk Kwon	5d5b4c5fe5	[Bugfix][TPU] Add missing None to model input (#6245 )	2024-07-09 00:21:37 -07:00
xwjiang2010	d9e98f42e4	[vlm] Remove vision language config. (#6089 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: Roger Wang <ywang@roblox.com>	2024-07-03 22:14:16 +00:00
Cyrus Leung	9831aec49f	[Core] Dynamic image size support for VLMs (#5276 ) Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com> Co-authored-by: ywang96 <ywang@roblox.com> Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-07-02 20:34:00 -07:00
Woosuk Kwon	7f83f40dee	[Bugfix][TPU] Fix pad slot id (#5977 )	2024-06-28 18:55:17 -07:00
Cody Yu	b2c620230a	[Spec Decode] Introduce DraftModelRunner (#5799 )	2024-06-28 09:17:51 -07:00
Woosuk Kwon	cbc53b6b8d	[Hardware][TPU] Support parallel sampling & Swapping (#5855 )	2024-06-26 11:07:49 -07:00
Woosuk Kwon	f178e56c68	[Hardware][TPU] Raise errors for unsupported sampling params (#5850 )	2024-06-25 16:58:23 -07:00
Woosuk Kwon	bc34937d68	[Hardware][TPU] Refactor TPU backend (#5831 )	2024-06-25 15:25:52 -07:00
Woosuk Kwon	1a8bfd92d5	[Hardware] Initial TPU integration (#5292 )	2024-06-12 11:53:03 -07:00

30 Commits