Varun Sundar Rabindranath
8a8b30eac1
[Bugfix] LoRA V0 - Fix case where max_num_seqs is between cudagraph capture sizes ( #15308 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-03-22 02:03:32 -07:00
Jee Jee Li
db7c8ca910
[Misc] Embedding model support LoRA ( #14935 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-18 12:07:00 +00:00
Varun Sundar Rabindranath
400d483e87
[Kernels] LoRA - Retire SGMV and BGMV Kernels ( #14685 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-03-18 09:47:53 +00:00
Varun Sundar Rabindranath
0b1cfa6180
[Kernel] LoRA - Enable CUDAGraphs for V1 ( #14626 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-03-13 20:42:04 -07:00
Varun Sundar Rabindranath
5ff0d32580
[V1] LoRA - Add triton kernels for V1 ( #13096 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-03-10 17:27:53 -04:00
Jee Jee Li
ddd1ef66ec
[Bugfix] Fix JambaForCausalLM LoRA ( #14370 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-06 22:05:47 -08:00
Isotr0py
e17e4488bd
[LoRA] Remove linear hack outside transformers backend ( #14177 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-03-05 15:06:28 +00:00
Jee Jee Li
cc5e8f6db8
[Model] Add LoRA support for TransformersModel ( #13770 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-03-02 09:17:34 +08:00
Jee Jee Li
5157338ed9
[Misc] Improve LoRA spelling ( #13831 )
2025-02-25 23:43:01 -08:00
cjackal
51010a1807
[Misc] set single whitespace between log sentences ( #13771 )
...
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
2025-02-25 10:26:12 +08:00
Jee Jee Li
105b8ce4c0
[Misc] Reduce LoRA-related static variable ( #13166 )
2025-02-22 00:21:30 -08:00
Varun Sundar Rabindranath
b69692a2d8
[Kernel] LoRA - Refactor sgmv kernels ( #13110 )
2025-02-20 07:28:06 -05:00
Yuan Tang
e2603fefb8
[Bugfix] Ensure LoRA path from the request can be included in err msg ( #13450 )
...
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-18 16:19:15 +08:00
Qubitium-ModelCloud
36a08630e8
[CORE] [QUANT] Support for GPTQModel's dynamic quantization per module override/control ( #7086 )
2025-02-12 09:19:43 -08:00
Jee Jee Li
82cabf53a3
[Misc] Delete unused LoRA modules ( #13151 )
2025-02-12 08:58:24 -08:00
Sanju C Sudhakaran
2880e21e3d
[Hardware][Intel-Gaudi] Enable long-contexts + LoRA support for Intel Gaudi ( #12812 )
...
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>
2025-02-08 17:15:30 +08:00
Varun Sundar Rabindranath
467a96a541
[V1] LoRA Support ( #10957 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-02-06 09:32:51 -08:00
Russell Bryant
e489ad7a21
[Misc] Add SPDX-License-Identifier headers to python source files ( #12628 )
...
- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**
commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com>
Date: Fri Jan 31 14:18:24 2025 -0500
Add SPDX license headers to python source files
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
also be easily used by tools to help manage license compliance.
The Linux Foundation runs license scans against the codebase to help
ensure
we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
More information can be found on the SPDX site:
- https://spdx.dev/learn/handling-license-info/
Signed-off-by: Russell Bryant <rbryant@redhat.com>
commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com>
Date: Fri Jan 31 14:36:32 2025 -0500
Check for SPDX headers using pre-commit
Signed-off-by: Russell Bryant <rbryant@redhat.com>
---------
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-02-02 11:58:18 -08:00
Russell Bryant
e497f33491
[Core] Silence unnecessary deprecation warnings ( #12620 )
...
I noticed during testing that I was getting a lot of these deprecation
warnings about `local_lora_path`:
```
DeprecationWarning: The 'lora_local_path' attribute is deprecated
and will be removed in a future version.
Please use 'lora_path' instead.
```
The check used for emitting this warning was always True, even when the
parameter was not actually specified. It will always be in
`__struct_fields__`. We should be checking for a non-None value,
instead.
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-02-02 15:35:50 +08:00
Harry Mellor
823ab79633
Update pre-commit hooks ( #12475 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-27 17:23:08 -07:00
Russell Bryant
d3d6bb13fb
Set weights_only=True when using torch.load() ( #12366 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-01-24 02:17:30 +00:00
Jee Jee Li
07934cc237
[Misc][LoRA] Improve the readability of LoRA error messages ( #12102 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-01-17 19:32:28 +08:00
youkaichao
bf53e0c70b
Support torchrun and SPMD-style offline inference ( #12071 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-16 19:58:53 +08:00
Varun Sundar Rabindranath
ebd8c669ef
[Bugfix] Fix _get_lora_device for HQQ marlin ( #12090 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2025-01-15 19:59:42 +00:00
Shanshan Shen
a7d59688fb
[Platform] Move get_punica_wrapper() function to Platform ( #11516 )
...
Signed-off-by: Shanshan Shen <467638484@qq.com>
2025-01-13 13:12:10 +00:00
Akshat Tripathi
8bddb73512
[Hardware][CPU] Multi-LoRA implementation for the CPU backend ( #11100 )
...
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Oleg Mosalov <oleg@krai.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Oleg Mosalov <oleg@krai.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-01-12 13:01:52 +00:00
Joe Runde
ac2f3f7fee
[Bugfix] Validate lora adapters to avoid crashing server ( #11727 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2025-01-10 15:56:36 +08:00
Cyrus Leung
d848800e88
[Misc] Move print_*_once from utils to logger ( #11298 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
Co-authored-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
2025-01-09 12:48:12 +08:00
Jee Jee Li
b278557935
[Kernel][LoRA]Punica prefill kernels fusion ( #11234 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Abatom <abzhonghua@gmail.com>
Co-authored-by: Zhonghua Deng <abatom@163.com>
2025-01-07 04:01:39 +00:00
Lucas Tucker
9c749713f6
[mypy] Forward pass function type hints in lora ( #11740 )
...
Signed-off-by: lucast2021 <lucast2021@headroyce.org>
Co-authored-by: lucast2021 <lucast2021@headroyce.org>
2025-01-06 07:59:36 +00:00
ZincCat
61fed92c7e
[Bugfix] Fix ColumnParallelLinearWithLoRA slice ( #11708 )
...
Signed-off-by: ZincCat <zincchloride@outlook.com>
2025-01-03 21:02:34 +00:00
John Giorgi
82c49d3260
[Misc][LoRA] Support Rank Stabilized LoRA (RSLoRA) ( #6909 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-30 22:15:58 -08:00
Jee Jee Li
aa25985bd1
[Misc][LoRA] Fix LoRA weight mapper ( #11495 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-26 15:52:48 +08:00
Jee Jee Li
b1b1038fbd
[Bugfix] Fix Qwen2-VL LoRA weight loading ( #11430 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-24 09:56:10 +00:00
Jason T. Greene
f1d1bf6288
[Bugfix] Fix fully sharded LoRAs with Mixtral ( #11390 )
...
Signed-off-by: Jason Greene <jason.greene@redhat.com>
2024-12-22 23:25:10 +08:00
Jee Jee Li
3cb5769883
[Misc] Minor improvements to the readability of PunicaWrapperBase ( #11200 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-14 16:38:27 +00:00
Sanju C Sudhakaran
8195824206
[Hardware][Intel-Gaudi] Enable LoRA support for Intel Gaudi (HPU) ( #10565 )
...
Signed-off-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>
2024-12-12 08:09:28 +00:00
Jee Jee Li
d05f88679b
[Misc][LoRA] Add PEFTHelper for LoRA ( #11003 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-10 11:12:01 +00:00
Jee Jee Li
ca871491ed
[Misc][LoRA] Abstract PunicaWrapper ( #10955 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-09 12:54:44 -08:00
Isotr0py
b26b4cd03c
[Misc][LoRA] Refactor and clean MergedQKVParallelLinearWithLora implementation ( #10958 )
...
Signed-off-by: Isotr0py <2037008807@qq.com>
2024-12-07 18:33:49 +08:00
Jee Jee Li
571da8fc43
[Misc][LoRA] Clean up the function interface of Punica ( #10917 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-05 13:22:28 +00:00
Jee Jee Li
a4cf256159
[Bugfix] Fix QKVParallelLinearWithShardedLora bias bug ( #10844 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-03 12:10:29 +08:00
Jee Jee Li
b45f0d7946
[Misc][LoRA] Move the implementation of lora bias to punica.py ( #10829 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-12-02 17:53:36 +00:00
Jee Jee Li
1700c543a5
[Bugfix] Fix LoRA weight sharding ( #10450 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2024-11-23 17:23:17 -08:00
Jee Jee Li
2385b60d83
[Kernel] Register punica ops directly ( #10522 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-11-21 09:18:11 -08:00
Angus Wang
c2170a5b39
[Kernel] Explicitly specify other value in tl.load calls ( #9014 )
...
Signed-off-by: Angus Wang <wangjadehao@gmail.com>
2024-11-18 11:39:40 -08:00
Jee Jee Li
1d65ec7eeb
[Bugfix] Fix fully sharded LoRA bug ( #10352 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-11-15 10:34:58 +00:00
Umesh
8a06428c70
[LoRA] Adds support for bias in LoRA ( #5733 )
...
Signed-off-by: Umesh Deshpande <udeshpa@us.ibm.com>
Co-authored-by: Umesh Deshpande <udeshpa@us.ibm.com>
2024-11-12 11:08:40 -08:00
Jee Jee Li
7f5edb5900
[Misc][LoRA] Replace hardcoded cuda device with configurable argument ( #10223 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-11-12 11:10:15 +08:00
Jee Jee Li
36e4acd02a
[LoRA][Kernel] Remove the unused libentry module ( #10214 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2024-11-11 09:43:23 +00:00