xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-03-19 12:17:15 +08:00

Author	SHA1	Message	Date
Nick Hill	9076325677	[BugFix] Don't scan entire cache dir when loading model (#13302 )	2025-02-14 21:33:31 -08:00
Michael Goin	f0b2da72a8	Expand MLA to support most types of quantization (#13181 )	2025-02-13 22:19:22 -08:00
youkaichao	b2496bb07f	[core] fix sleep mode and pytorch checkpoint compatibility (#13001 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-02-10 13:03:43 +08:00
Jun Duan	011e612d92	[Misc] Log time consumption on weight downloading (#12926 )	2025-02-08 09:16:42 +00:00
Harry Mellor	fcf2e3d7fc	[Bugfix] Fix OpenVINO model runner (#12750 )	2025-02-04 22:42:46 -08:00
Kyle Sayers	7ff7a638b6	[Model][Quant] Fix GLM, Fix fused module mappings for quantization (#12634 ) Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: mgoin <michael@neuralmagic.com>	2025-02-05 05:32:06 +00:00
Jee Jee Li	96b23621c1	[Misc] Add BNB quantization for Whisper (#12381 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-02-04 16:27:36 +08:00
Arthur	a1a2aaadb9	[Model]: Add `transformers` backend support (#11330 ) # Adds support for `transformers` as a backend Following https://github.com/huggingface/transformers/pull/35235, a bunch of models should already be supported, we are ramping up support for more models. Thanks @Isotr0py for the TP support, and @hmellor for his help as well! This includes: - `trust_remote_code=True` support: any model on the hub, if it implements attention the correct way can be natively supported!! - tensor parallel support --------- Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <41363108+Isotr0py@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-02-03 21:30:38 +08:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
Lucas Wilkinson	baeded2569	[Attention] Deepseek v3 MLA support with FP8 compute (#12601 ) This PR implements the Deepseek V3 support by performing matrix absorption the fp8 weights --------- Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: simon-mo <simon.mo@hey.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com>	2025-01-31 21:52:51 -08:00
Lucas Wilkinson	cabaf4eff3	[Attention] MLA decode optimizations (#12528 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: simon-mo <xmo@berkeley.edu> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: simon-mo <simon.mo@hey.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Zhuohan Li <zhuohan123@gmail.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Alexander Matveev <59768536+alexm-neuralmagic@users.noreply.github.com> Co-authored-by: simon-mo <xmo@berkeley.edu>	2025-01-30 23:49:37 -08:00
Pavani Majety	b02fd288b2	[Hardware][NV] Fix Modelopt model loading for k-v-scales for Llama models. (#11787 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com> Co-authored-by: mgoin <michael@neuralmagic.com>	2025-01-29 01:46:12 -08:00
Harry Mellor	823ab79633	Update `pre-commit` hooks (#12475 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-27 17:23:08 -07:00
Bowen Wang	2bc3fbba0c	[FlashInfer] Upgrade to 0.2.0 (#11194 ) Signed-off-by: Bowen Wang <abmfy@icloud.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-01-27 18:19:24 +00:00
Russell Bryant	d3d6bb13fb	Set weights_only=True when using torch.load() (#12366 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-01-24 02:17:30 +00:00
Gregory Shtrasberg	e97f802b2d	[FP8][Kernel] Dynamic kv cache scaling factors computation (#11906 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: Micah Williamson <micah.williamson@amd.com>	2025-01-23 18:04:03 +00:00
Jee Jee Li	84bee4bd5c	[Misc] Improve the readability of BNB error messages (#12320 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-01-22 16:56:54 +00:00
Cyrus Leung	59a0192fb9	[Core] Interface for accessing model from `VllmRunner` (#10353 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-01-20 15:00:59 +08:00
Martin Gleize	bbe5f9de7d	[Model] Support for fairseq2 Llama (#11442 ) Signed-off-by: Martin Gleize <mgleize@meta.com> Co-authored-by: mgleize user <mgleize@a100-st-p4de24xlarge-4.fair-a100.hpcaas>	2025-01-19 10:40:40 -08:00
Isotr0py	edaae198e7	[Misc] Add BNB support to GLM4-V model (#12184 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-01-19 19:49:22 +08:00
Jee Jee Li	a3a3ee4e6f	[Misc] Merge bitsandbytes_stacked_params_mapping and packed_modules_mapping (#11924 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-01-15 07:49:49 +08:00
youkaichao	d53575a5f0	[ci] fix gh200 tests (#11919 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-10 16:25:17 +08:00
Cyrus Leung	d848800e88	[Misc] Move `print_*_once` from utils to logger (#11298 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com> Co-authored-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>	2025-01-09 12:48:12 +08:00
Harry Mellor	aba8d6ee00	[Doc] Move examples into categories (#11840 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-08 13:09:53 +00:00
Isotr0py	dde1fa18c9	[Misc] Improve BNB loader to handle mixture of sharded and merged weights with same suffix (#11566 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2024-12-27 19:45:13 +00:00
Jee Jee Li	0240402c46	[Misc]Add BNB quantization for MolmoForCausalLM (#11551 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-27 18:48:24 +00:00
Cyrus Leung	eec906d811	[Misc] Add placeholder module (#11501 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-26 13:12:51 +00:00
Cyrus Leung	3f3e92e1f2	[Model] Automatic conversion of classification and reward models (#11469 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-24 18:22:22 +00:00
omer-dayan	995f56236b	[Core] Loading model from S3 using RunAI Model Streamer as optional loader (#10192 ) Signed-off-by: OmerD <omer@run.ai>	2024-12-20 16:46:24 +00:00
Cyrus Leung	8f10d5e393	[Misc] Split up pooling tasks (#10820 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-11 01:28:00 -08:00
Cyrus Leung	bf0e382e16	[Model] Composite weight loading for multimodal Qwen2 (#10944 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-07 07:22:52 -07:00
Jee Jee Li	1f958a7d52	[Bugfix] Fix BNB loader target_modules (#10720 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-05 13:20:26 +08:00
Isotr0py	4c05edb33a	[Model] Add TP and BNB quantization support to LlavaMultiModalProjector (#10834 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-12-02 23:06:09 +00:00
Cyrus Leung	133707123e	[Model] Replace embedding models with pooling adapter (#10769 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-12-01 08:02:54 +08:00
Jee Jee Li	15cc2a9f1a	[Misc]Further reduce BNB static variable (#10597 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-11-26 22:54:12 -08:00
youkaichao	05d1f8c9c6	[misc] move functions to config.py (#10624 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-25 09:27:30 +00:00
Jee Jee Li	17d8fc1806	[bugfix] Fix example/tensorize_vllm_model tests (#10595 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-11-23 17:22:33 -08:00
Isotr0py	b6374e09b0	[Bugfix] Fix Phi-3 BNB quantization with tensor parallel (#9948 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2024-11-22 15:01:56 +08:00
Russell Bryant	fd9f124971	[Doc] fix link for page that was renamed (#10455 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-11-19 09:48:30 -08:00
Yan Ma	6b2d25efc7	[Hardware][XPU] AWQ/GPTQ support for xpu backend (#10107 ) Signed-off-by: yan ma <yan.ma@intel.com>	2024-11-18 11:18:05 -07:00
Isotr0py	c4e464333e	[Misc] Add uninitialized params tracking for `AutoWeightsLoader` (#10327 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2024-11-18 09:07:46 +08:00
youkaichao	4fd9375028	[2/N][torch.compile] make compilation cfg part of vllm cfg (#10383 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-16 18:02:14 -08:00
youkaichao	3a763ba0c3	[core][misc] keep compatibility for old-style classes (#10356 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-15 13:55:51 +00:00
youkaichao	504ac53d18	[misc] error early for old-style class (#10304 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-13 18:55:39 -08:00
HoangCongDuc	ac49b59d8b	[Bugfix] bitsandbytes models fail to run pipeline parallel (#10200 ) Signed-off-by: Hoang Cong Duc <hoangcongducltt@gmail.com>	2024-11-13 09:56:39 -07:00
youkaichao	1a95f10ee7	[5/N] pass the whole config to model (#9983 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-09 14:17:28 +08:00
Aaron Pham	21063c11c7	[CI/Build] drop support for Python 3.8 EOL (#8464 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2024-11-06 07:11:55 +00:00
Jee Jee Li	b9c64c0ca7	[Misc] Modify BNB parameter name (#9997 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-11-05 14:40:08 -05:00
Jee Jee Li	fb2716d641	[Misc]Reduce BNB static variable (#9987 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-11-04 17:04:40 +00:00
youkaichao	8d72bb20fa	[4/N] make quant config first-class citizen (#9978 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-04 08:51:31 -08:00

1 2 3

143 Commits