xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-31 23:28:44 +08:00

Author	SHA1	Message	Date
Woosuk Kwon	b81364a7cd	[V0 Deprecation] Remove V0 sampling metadata (#25345 ) Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai> Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-03 13:35:53 -07:00
Lukas Geiger	de533ab2a1	[Models] Improve iteration over layers (#19497 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-08-29 09:26:34 +08:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Isotr0py	f07a673eb2	[Misc] Allow `AutoWeightsLoader` to skip loading weights with specific substr in name (#18358 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-19 20:20:12 -07:00
Harry Mellor	26d0419309	Update deprecated type hinting in `models` (#18132 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-14 22:06:50 -07:00
Woosuk Kwon	b411418ff0	[Chore] Remove Sampler from Model Code (#17084 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-24 02:49:33 -07:00
rongfu.leng	242a637aea	[Model] use AutoWeightsLoader for stablelm,starcoder2,zamba2 (#16103 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-04-06 05:52:01 -07:00
Tyler Michael Smith	4f5b059f14	Clean up unused padding_idx variables across many model definitions (#13240 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-03-04 21:27:00 +00:00
Harry Mellor	cdc1fa12eb	Remove unused kwargs from model definitions (#13555 )	2025-02-24 17:13:52 -08:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
Isotr0py	d14e98d924	[Model] Support GGUF models newly added in `transformers` 4.46.0 (#9685 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-01-13 00:13:44 +00:00
Dipika Sikka	3989a79824	[Bugfix] Update starcoder2 to remap k/v scale names for kv_cache quantization (#11148 )	2024-12-13 05:07:20 +00:00
youkaichao	eebad39f26	[torch.compile] support all attention backends (#10558 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-22 14:04:42 -08:00
Isotr0py	c4e464333e	[Misc] Add uninitialized params tracking for `AutoWeightsLoader` (#10327 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2024-11-18 09:07:46 +08:00
Roger Wang	643ecf7b11	[V1] Refactor model executable interface for all text-only language models (#10374 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2024-11-17 05:18:46 +00:00
youkaichao	f89d18ff74	[6/N] pass whole config to inner model (#10205 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-11 06:41:46 +00:00
youkaichao	1a95f10ee7	[5/N] pass the whole config to model (#9983 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-09 14:17:28 +08:00
Joe Runde	d58268c56a	[V1] Make v1 more testable (#9888 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-11-06 11:57:35 -08:00
Michael Goin	399c798608	Remove ScaledActivation for AWQ (#10057 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2024-11-06 14:27:06 +00:00
Aaron Pham	21063c11c7	[CI/Build] drop support for Python 3.8 EOL (#8464 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2024-11-06 07:11:55 +00:00
Yongzao	d27cfbf791	[torch.compile] Adding torch compile annotations to some models (#9641 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-10-24 09:31:42 -07:00
Murali Andoorveedu	0f6d7a9a34	[Models] Add remaining model PP support (#7168 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-04 10:56:58 +08:00
afeldman-nm	428dd1445e	[Core] Logprobs support in Multi-step (#7652 )	2024-08-29 19:19:08 -07:00
Cyrus Leung	7025b11d94	[Bugfix] Fix weight loading for Chameleon when TP>1 (#7410 )	2024-08-13 05:33:41 +00:00
Qubitium-ModelCloud	ee93f4f92a	[CORE] Quantized lm-head Framework (#4442 ) Co-authored-by: Robert Shaw <rshaw@neuralmagic.com> Co-authored-by: ZX <zx@lbx.dev>	2024-07-02 22:25:17 +00:00
Murali Andoorveedu	c5832d2ae9	[Core] Pipeline Parallel Support (#4412 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>	2024-07-02 10:58:08 -07:00
Zhuohan Li	1102bef219	[Bugfix / Core] Prefix Caching Guards (merged with main) (#4846 ) Co-authored-by: rsnm2 <rshaw@neuralmagic.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>	2024-05-27 15:18:17 -07:00
Cody Yu	a3a73ab069	[Misc] Load FP8 kv-cache scaling factors from checkpoints (#4893 ) The 2nd PR for #4532. This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).	2024-05-22 13:28:20 -07:00
Woosuk Kwon	0fca3cdcf2	[Misc] Enhance attention selector (#4751 )	2024-05-13 10:47:25 -07:00
Robert Shaw	4ea1f9678d	[BugFix] Resolved Issues For LinearMethod --> QuantConfig (#4418 )	2024-04-27 18:35:33 +00:00
Cody Yu	a62aaf1df5	[Misc][Refactor] Generalize linear_method to be quant_method (#4373 )	2024-04-26 16:41:14 -04:00
Antoni Baum	69e1d2fb69	[Core] Refactor model loading code (#4097 )	2024-04-16 11:34:39 -07:00
youkaichao	63e7176f26	[Core][Refactor] move parallel_utils into vllm/distributed (#3950 ) [WIP][Core][Refactor] move vllm/model_executor/parallel_utils into vllm/distributed and vllm/device_communicators (#3950)	2024-04-10 15:33:30 -07:00
SangBin Cho	01bfb22b41	[CI] Try introducing isort. (#3495 )	2024-03-25 07:59:47 -07:00
Woosuk Kwon	925f3332ca	[Core] Refactor Attention Take 2 (#3462 )	2024-03-25 04:39:33 +00:00
少年	b0dfa91dd7	[Model] Add starcoder2 awq support (#3569 )	2024-03-24 21:07:36 -07:00
Woosuk Kwon	c188ecb080	[Misc] Bump up transformers to v4.39.0 & Remove StarCoder2Config (#3551 ) Co-authored-by: Roy <jasonailu87@gmail.com> Co-authored-by: Roger Meier <r.meier@siemens.com>	2024-03-21 07:58:12 -07:00
Roy	f1c0fc3919	Migrate `logits` computation and gather to `model_runner` (#3233 )	2024-03-20 23:25:01 +00:00
Zhuohan Li	2f8844ba08	Re-enable the 80 char line width limit (#3305 )	2024-03-10 19:49:14 -07:00
Woosuk Kwon	2daf23ab0c	Separate attention backends (#3005 )	2024-03-07 01:45:50 -08:00
Seonghyeon	bfdcfa6a05	Support starcoder2 architecture (#3089 )	2024-02-29 00:51:48 -08:00

41 Commits