xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-30 01:01:21 +08:00

Author	SHA1	Message	Date
Varun Sundar Rabindranath	8a8b30eac1	[Bugfix] LoRA V0 - Fix case where `max_num_seqs` is between cudagraph capture sizes (#15308 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-22 02:03:32 -07:00
Jee Jee Li	db7c8ca910	[Misc] Embedding model support LoRA (#14935 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-18 12:07:00 +00:00
Varun Sundar Rabindranath	400d483e87	[Kernels] LoRA - Retire SGMV and BGMV Kernels (#14685 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-18 09:47:53 +00:00
Varun Sundar Rabindranath	0b1cfa6180	[Kernel] LoRA - Enable CUDAGraphs for V1 (#14626 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-13 20:42:04 -07:00
Varun Sundar Rabindranath	5ff0d32580	[V1] LoRA - Add triton kernels for V1 (#13096 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-03-10 17:27:53 -04:00
Jee Jee Li	ddd1ef66ec	[Bugfix] Fix JambaForCausalLM LoRA (#14370 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-06 22:05:47 -08:00
Isotr0py	e17e4488bd	[LoRA] Remove linear hack outside transformers backend (#14177 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-03-05 15:06:28 +00:00
Jee Jee Li	cc5e8f6db8	[Model] Add LoRA support for TransformersModel (#13770 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-03-02 09:17:34 +08:00
Jee Jee Li	5157338ed9	[Misc] Improve LoRA spelling (#13831 )	2025-02-25 23:43:01 -08:00
cjackal	51010a1807	[Misc] set single whitespace between log sentences (#13771 ) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>	2025-02-25 10:26:12 +08:00
Jee Jee Li	105b8ce4c0	[Misc] Reduce LoRA-related static variable (#13166 )	2025-02-22 00:21:30 -08:00
Varun Sundar Rabindranath	b69692a2d8	[Kernel] LoRA - Refactor sgmv kernels (#13110 )	2025-02-20 07:28:06 -05:00
Yuan Tang	e2603fefb8	[Bugfix] Ensure LoRA path from the request can be included in err msg (#13450 ) Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-18 16:19:15 +08:00
Qubitium-ModelCloud	36a08630e8	[CORE] [QUANT] Support for GPTQModel's `dynamic` quantization per module override/control (#7086 )	2025-02-12 09:19:43 -08:00
Jee Jee Li	82cabf53a3	[Misc] Delete unused LoRA modules (#13151 )	2025-02-12 08:58:24 -08:00
Sanju C Sudhakaran	2880e21e3d	[Hardware][Intel-Gaudi] Enable long-contexts + LoRA support for Intel Gaudi (#12812 ) Signed-off-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>	2025-02-08 17:15:30 +08:00
Varun Sundar Rabindranath	467a96a541	[V1] LoRA Support (#10957 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-02-06 09:32:51 -08:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
Russell Bryant	e497f33491	[Core] Silence unnecessary deprecation warnings (#12620 ) I noticed during testing that I was getting a lot of these deprecation warnings about `local_lora_path`: ``` DeprecationWarning: The 'lora_local_path' attribute is deprecated and will be removed in a future version. Please use 'lora_path' instead. ``` The check used for emitting this warning was always True, even when the parameter was not actually specified. It will always be in `__struct_fields__`. We should be checking for a non-None value, instead. Signed-off-by: Russell Bryant <rbryant@redhat.com> Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 15:35:50 +08:00
Harry Mellor	823ab79633	Update `pre-commit` hooks (#12475 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-27 17:23:08 -07:00
Russell Bryant	d3d6bb13fb	Set weights_only=True when using torch.load() (#12366 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-01-24 02:17:30 +00:00
Jee Jee Li	07934cc237	[Misc][LoRA] Improve the readability of LoRA error messages (#12102 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-01-17 19:32:28 +08:00
youkaichao	bf53e0c70b	Support torchrun and SPMD-style offline inference (#12071 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-01-16 19:58:53 +08:00
Varun Sundar Rabindranath	ebd8c669ef	[Bugfix] Fix _get_lora_device for HQQ marlin (#12090 ) Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com> Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>	2025-01-15 19:59:42 +00:00
Shanshan Shen	a7d59688fb	[Platform] Move get_punica_wrapper() function to Platform (#11516 ) Signed-off-by: Shanshan Shen <467638484@qq.com>	2025-01-13 13:12:10 +00:00
Akshat Tripathi	8bddb73512	[Hardware][CPU] Multi-LoRA implementation for the CPU backend (#11100 ) Signed-off-by: Akshat Tripathi <akshat@krai.ai> Signed-off-by: Oleg Mosalov <oleg@krai.ai> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Oleg Mosalov <oleg@krai.ai> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-01-12 13:01:52 +00:00
Joe Runde	ac2f3f7fee	[Bugfix] Validate lora adapters to avoid crashing server (#11727 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-01-10 15:56:36 +08:00
Cyrus Leung	d848800e88	[Misc] Move `print_*_once` from utils to logger (#11298 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com> Co-authored-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>	2025-01-09 12:48:12 +08:00
Jee Jee Li	b278557935	[Kernel][LoRA]Punica prefill kernels fusion (#11234 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: Abatom <abzhonghua@gmail.com> Co-authored-by: Zhonghua Deng <abatom@163.com>	2025-01-07 04:01:39 +00:00
Lucas Tucker	9c749713f6	[mypy] Forward pass function type hints in lora (#11740 ) Signed-off-by: lucast2021 <lucast2021@headroyce.org> Co-authored-by: lucast2021 <lucast2021@headroyce.org>	2025-01-06 07:59:36 +00:00
ZincCat	61fed92c7e	[Bugfix] Fix ColumnParallelLinearWithLoRA slice (#11708 ) Signed-off-by: ZincCat <zincchloride@outlook.com>	2025-01-03 21:02:34 +00:00
John Giorgi	82c49d3260	[Misc][LoRA] Support Rank Stabilized LoRA (RSLoRA) (#6909 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-30 22:15:58 -08:00
Jee Jee Li	aa25985bd1	[Misc][LoRA] Fix LoRA weight mapper (#11495 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-26 15:52:48 +08:00
Jee Jee Li	b1b1038fbd	[Bugfix] Fix Qwen2-VL LoRA weight loading (#11430 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-24 09:56:10 +00:00
Jason T. Greene	f1d1bf6288	[Bugfix] Fix fully sharded LoRAs with Mixtral (#11390 ) Signed-off-by: Jason Greene <jason.greene@redhat.com>	2024-12-22 23:25:10 +08:00
Jee Jee Li	3cb5769883	[Misc] Minor improvements to the readability of PunicaWrapperBase (#11200 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-14 16:38:27 +00:00
Sanju C Sudhakaran	8195824206	[Hardware][Intel-Gaudi] Enable LoRA support for Intel Gaudi (HPU) (#10565 ) Signed-off-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>	2024-12-12 08:09:28 +00:00
Jee Jee Li	d05f88679b	[Misc][LoRA] Add PEFTHelper for LoRA (#11003 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-10 11:12:01 +00:00
Jee Jee Li	ca871491ed	[Misc][LoRA] Abstract PunicaWrapper (#10955 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-09 12:54:44 -08:00
Isotr0py	b26b4cd03c	[Misc][LoRA] Refactor and clean MergedQKVParallelLinearWithLora implementation (#10958 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2024-12-07 18:33:49 +08:00
Jee Jee Li	571da8fc43	[Misc][LoRA] Clean up the function interface of Punica (#10917 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-05 13:22:28 +00:00
Jee Jee Li	a4cf256159	[Bugfix] Fix QKVParallelLinearWithShardedLora bias bug (#10844 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-03 12:10:29 +08:00
Jee Jee Li	b45f0d7946	[Misc][LoRA] Move the implementation of lora bias to punica.py (#10829 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-12-02 17:53:36 +00:00
Jee Jee Li	1700c543a5	[Bugfix] Fix LoRA weight sharding (#10450 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2024-11-23 17:23:17 -08:00
Jee Jee Li	2385b60d83	[Kernel] Register punica ops directly (#10522 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-11-21 09:18:11 -08:00
Angus Wang	c2170a5b39	[Kernel] Explicitly specify other value in tl.load calls (#9014 ) Signed-off-by: Angus Wang <wangjadehao@gmail.com>	2024-11-18 11:39:40 -08:00
Jee Jee Li	1d65ec7eeb	[Bugfix] Fix fully sharded LoRA bug (#10352 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-11-15 10:34:58 +00:00
Umesh	8a06428c70	[LoRA] Adds support for bias in LoRA (#5733 ) Signed-off-by: Umesh Deshpande <udeshpa@us.ibm.com> Co-authored-by: Umesh Deshpande <udeshpa@us.ibm.com>	2024-11-12 11:08:40 -08:00
Jee Jee Li	7f5edb5900	[Misc][LoRA] Replace hardcoded cuda device with configurable argument (#10223 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-11-12 11:10:15 +08:00
Jee Jee Li	36e4acd02a	[LoRA][Kernel] Remove the unused libentry module (#10214 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-11-11 09:43:23 +00:00

1 2 3

120 Commits