xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-03 01:27:10 +08:00

Author	SHA1	Message	Date
Michael Goin	6d94420246	[Doc] Update supported_hardware.rst (#7276 )	2024-08-07 14:21:50 -07:00
Nick Hill	fc1493a01e	[FrontEnd] Make `merge_async_iterators` `is_cancelled` arg optional (#7282 )	2024-08-07 13:35:14 -07:00
Lucas Wilkinson	311f743831	[Bugfix] Fix gptq failure on T4s (#7264 )	2024-08-07 20:05:37 +00:00
Kevin H. Luu	469b3bc538	[ci] Make building wheels per commit optional (#7278 ) Signed-off-by: kevin <kevin@anyscale.com>	2024-08-07 11:34:25 -07:00
Michael Goin	5223199e03	[Bugfix][FP8] Fix dynamic FP8 Marlin quantization (#7219 )	2024-08-07 11:23:12 -07:00
Maximilien de Bayser	fde47d3bc2	[BugFix] Fix frontend multiprocessing hang (#7217 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>	2024-08-07 18:09:36 +00:00
Stas Bekman	0e12cd67a8	[Doc] add online speculative decoding example (#7243 )	2024-08-07 09:58:02 -07:00
Ilya Lavrenov	80cbe10c59	[OpenVINO] migrate to latest dependencies versions (#7251 )	2024-08-07 09:49:10 -07:00
Isotr0py	b764547616	[Bugfix] Fix input processor for InternVL2 model (#7164 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-08-07 09:32:07 -07:00
Rafael Vasquez	ab0f5e2823	Fixes typo in function name (#7275 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2024-08-07 09:29:27 -07:00
Robert Shaw	564985729a	[ BugFix ] Move `zmq` frontend to IPC instead of TCP (#7222 )	2024-08-07 16:24:56 +00:00
Dipika Sikka	0f7052bc7e	[Misc] Refactor linear layer weight loading; introduce `BasevLLMParameter` and `weight_loader_v2` (#5874 )	2024-08-07 09:17:58 -07:00
youkaichao	639159b2a6	[distributed][misc] add specialized method for cuda platform (#7249 )	2024-08-07 08:54:52 -07:00
Cyrus Leung	66d617e343	[Frontend] Gracefully handle missing chat template and fix CI failure (#7238 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-08-07 09:12:05 +00:00
Atilla Akkuş	7b261092de	[BUGFIX]: top_k is expected to be an integer. (#7227 )	2024-08-07 00:32:16 -07:00
Roger Wang	2385c8f374	[Doc] Mock new dependencies for documentation (#7245 )	2024-08-07 06:43:03 +00:00
Nick Hill	9a3f49ae07	[BugFix] Overhaul async request cancellation (#7111 )	2024-08-07 13:21:41 +08:00
Michael Goin	f9a5600649	[Bugfix] Fix GPTQ and GPTQ Marlin CPU Offloading (#7225 )	2024-08-06 18:34:26 -07:00
afeldman-nm	fd95e026e0	[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942 ) Co-authored-by: Andrew Feldman <afeld2012@gmail.com> Co-authored-by: Nick Hill <nickhill@us.ibm.com>	2024-08-06 16:51:47 -04:00
xiaobochen123	660470e5a3	[Core] Optimize evictor-v2 performance (#7193 )	2024-08-06 12:34:25 -07:00
Luka Govedič	8d59dbb000	[Kernel] Add per-tensor and per-token AZP epilogues (#5941 ) Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2024-08-06 18:17:08 +00:00
Lily Liu	5c60c8c423	[SpecDecode] [Minor] Fix spec decode sampler tests (#7183 )	2024-08-06 10:40:32 -07:00
Katarzyna Papis	00afc78590	[Bugfix] add gguf dependency (#7198 ) Co-authored-by: katarzyna.papis <kpapis@kpapis-u20.sclab.intel.com>	2024-08-06 10:08:35 -07:00
Robert Shaw	541c1852d3	[ BugFix ] Fix ZMQ when `VLLM_PORT` is set (#7205 )	2024-08-06 09:26:26 -07:00
Dipika Sikka	a3bbbfa1d8	[BugFix] Fix DeepSeek remote code (#7178 )	2024-08-06 08:16:53 -07:00
Cyrus Leung	1f26efbb3a	[Model] Support SigLIP encoder and alternative decoders for LLaVA models (#7153 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-08-06 16:55:31 +08:00
Jee Jee Li	9118217f58	[LoRA] Relax LoRA condition (#7146 )	2024-08-06 01:57:25 +00:00
Simon Mo	e3c664bfcb	[Build] Add initial conditional testing spec (#6841 )	2024-08-05 17:39:22 -07:00
Isotr0py	360bd67cf0	[Core] Support loading GGUF model (#5191 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-08-05 17:54:23 -06:00
Cody Yu	ef527be06c	[MISC] Use non-blocking transfer in prepare_input (#7172 )	2024-08-05 23:41:27 +00:00
Jacob Schein	89b8db6bb2	[Bugfix] Specify device when loading LoRA and embedding tensors (#7129 ) Co-authored-by: Jacob Schein <jacobschein@Jacobs-MacBook-Pro-2.local>	2024-08-05 16:35:47 -07:00
Thomas Parnell	789937af2e	[Doc] [SpecDecode] Update MLPSpeculator documentation (#7100 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2024-08-05 23:29:43 +00:00
youkaichao	dfb1a15dcb	[ci][frontend] deduplicate tests (#7101 )	2024-08-05 15:59:22 -07:00
Simon Mo	4db5176d97	bump version to v0.5.4 (#7139 ) v0.5.4	2024-08-05 14:39:48 -07:00
Tyler Michael Smith	4cf1dc39be	[Bugfix][CI/Build] Fix CUTLASS FetchContent (#7171 )	2024-08-05 14:22:57 -07:00
Tyler Michael Smith	6e4852ce28	[CI/Build] Suppress divide-by-zero and missing return statement warnings (#7001 )	2024-08-05 16:00:01 -04:00
Tyler Michael Smith	8571ac4672	[Kernel] Update CUTLASS to 3.5.1 (#7085 )	2024-08-05 15:13:43 -04:00
Rui Qiao	997cf78308	[Misc] Fix typo in GroupCoordinator.recv() (#7167 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com>	2024-08-05 11:10:16 -07:00
Aditya Paliwal	57f560aa23	[BugFix] Use args.trust_remote_code (#7121 )	2024-08-05 09:26:14 -07:00
Nick Hill	003f8ee128	[BugFix] Use IP4 localhost form for zmq bind (#7163 )	2024-08-05 08:41:03 -07:00
Bongwon Jang	e9630458c7	[SpecDecode] Support FlashInfer in DraftModelRunner (#6926 )	2024-08-05 08:05:05 -07:00
Cade Daniel	82a1b1a82b	[Speculative decoding] Add periodic log with time spent in proposal/scoring/verification (#6963 )	2024-08-05 08:46:44 +00:00
Jungho Christopher Cho	c0d8f1636c	[Model] SiglipVisionModel ported from transformers (#6942 ) Co-authored-by: Roger Wang <ywang@roblox.com>	2024-08-05 06:22:12 +00:00
Cyrus Leung	cc08fc7225	[Frontend] Reapply "Factor out code for running uvicorn" (#7095 )	2024-08-04 20:40:51 -07:00
Alphi	7b86e7c9cd	[Model] Add multi-image support for minicpmv (#7122 ) Co-authored-by: hezhihui <hzh7269@modelbest.cn> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-08-05 09:23:17 +08:00
Jee Jee Li	f80ab3521c	Clean up remaining Punica C information (#7027 )	2024-08-04 15:37:08 -07:00
youkaichao	16a1cc9bb2	[misc][distributed] improve libcudart.so finding (#7127 )	2024-08-04 11:31:51 -07:00
Thomas Parnell	b1c9aa3daa	[Bugfix] [SpecDecode] Default speculative_draft_tensor_parallel_size to 1 when using MLPSpeculator (#7105 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2024-08-04 07:13:18 -07:00
Jee Jee Li	179a6a36f2	[Model]Refactor MiniCPMV (#7020 ) Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-08-04 08:12:41 +00:00
youkaichao	83c644fe7e	[core][misc] simply output processing with shortcut code path (#7117 )	2024-08-04 00:22:19 -07:00

1 2 3 4 5 ...

2238 Commits