xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-17 04:17:10 +08:00

Author	SHA1	Message	Date
Jee Jee Li	3d49776bbb	[Model][LoRA]LoRA support added for MiniCPMV2.5 (#7199 )	2024-09-29 06:59:45 +00:00
Jee Jee Li	9b0e3ec970	[Kernel][LoRA] Add assertion for punica sgmv kernels (#7585 )	2024-09-23 18:57:42 +00:00
Jiaxin Shan	260d40b5ea	[Core] Support Lora lineage and base model metadata management (#6315 )	2024-09-20 06:20:56 +00:00
Dipika Sikka	23f322297f	[Misc] Remove `SqueezeLLM` (#8220 )	2024-09-06 16:29:03 -06:00
Jiaxin Shan	db3bf7c991	[Core] Support load and unload LoRA in api server (#6566 ) Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2024-09-05 18:10:33 -07:00
bnellnm	3cdfe1f38b	[Bugfix] Make torch registration of punica ops optional (#7970 )	2024-08-28 16:11:49 -06:00
Kunshang Ji	6e4658c7aa	[Intel GPU] fix xpu not support punica kernel (which use torch.library.custom_op) (#7685 )	2024-08-20 12:01:09 -07:00
SangBin Cho	ff7ec82c4d	[Core] Optimize SPMD architecture with delta + serialization optimization (#7109 )	2024-08-18 17:57:20 -07:00
bnellnm	9f69856356	[Kernel] register punica functions as torch ops (#7591 )	2024-08-16 13:59:38 -07:00
William Lin	57b7be0e1c	[Speculative decoding] [Multi-Step] decouple should_modify_greedy_probs_inplace (#6971 )	2024-08-09 05:42:45 +00:00
Murali Andoorveedu	6dffa4b0a6	[Bugfix] Fix LoRA with PP (#7292 )	2024-08-08 00:02:27 -07:00
Jee Jee Li	9118217f58	[LoRA] Relax LoRA condition (#7146 )	2024-08-06 01:57:25 +00:00
Jacob Schein	89b8db6bb2	[Bugfix] Specify device when loading LoRA and embedding tensors (#7129 ) Co-authored-by: Jacob Schein <jacobschein@Jacobs-MacBook-Pro-2.local>	2024-08-05 16:35:47 -07:00
Jee Jee Li	f80ab3521c	Clean up remaining Punica C information (#7027 )	2024-08-04 15:37:08 -07:00
Jee Jee Li	99d7cabd7b	[LoRA] ReplicatedLinear support LoRA (#7081 )	2024-08-02 22:40:19 -07:00
Jee Jee Li	7ecee34321	[Kernel][RFC] Refactor the punica kernel based on Triton (#5036 )	2024-07-31 17:12:24 -07:00
Woosuk Kwon	d09b94ca58	[TPU] Support collective communications in XLA devices (#6813 )	2024-07-27 01:45:57 +00:00
Jiaxin Shan	42c7f66a38	[Core] Support dynamically loading Lora adapter from HuggingFace (#6234 ) Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2024-07-22 15:42:40 -07:00
Swapnil Parekh	4d6ada947c	[CORE] Adding support for insertion of soft-tuned prompts (#4645 ) Co-authored-by: Swapnil Parekh <swapnilp@ibm.com> Co-authored-by: Joe G <joseph.granados@h2o.ai> Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2024-07-09 13:26:36 -07:00
youkaichao	482045ee77	[hardware][misc] introduce platform abstraction (#6080 )	2024-07-02 20:12:22 -07:00
Qubitium-ModelCloud	ee93f4f92a	[CORE] Quantized lm-head Framework (#4442 ) Co-authored-by: Robert Shaw <rshaw@neuralmagic.com> Co-authored-by: ZX <zx@lbx.dev>	2024-07-02 22:25:17 +00:00
youkaichao	614aa51203	[misc][cuda] use nvml to avoid accidentally cuda initialization (#6007 )	2024-06-30 20:07:34 -07:00
SangBin Cho	f5e73c9f1b	[Lora] Use safetensor keys instead of adapter_config.json to find unexpected modules. (#5909 ) Co-authored-by: sang <sangcho@anyscale.com>	2024-06-30 17:11:15 +00:00
Woosuk Kwon	79c92c7c8a	[Model] Add Gemma 2 (#5908 )	2024-06-27 13:33:56 -07:00
Cyrus Leung	96354d6a29	[Model] Add base class for LoRA-supported models (#5018 )	2024-06-27 16:03:04 +08:00
rohithkrn	f5dda63eb5	[LoRA] Add support for pinning lora adapters in the LRU cache (#5603 )	2024-06-21 15:42:46 -07:00
Jee Li	67005a07bc	[Bugfix] Add fully sharded layer for QKVParallelLinearWithLora (#5665 ) Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2024-06-21 04:46:28 +00:00
Cyrus Leung	0e9164b40a	[mypy] Enable type checking for test directory (#5017 )	2024-06-15 04:45:31 +00:00
Cyrus Leung	0bfa1c4f13	[Misc] Improve error message when LoRA parsing fails (#5194 )	2024-06-10 19:38:49 +08:00
bnellnm	5467ac3196	[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )	2024-06-09 16:23:30 -04:00
Antoni Baum	ccdc490dda	[Core] Change LoRA embedding sharding to support loading methods (#5038 )	2024-06-06 19:07:57 -07:00
Zhuohan Li	8279078e21	[Bugfix] Remove deprecated @abstractproperty (#5174 )	2024-06-01 22:40:25 +00:00
raywanb	97b030005c	[Model] LoRA gptbigcode implementation (#3949 )	2024-05-22 13:58:59 -07:00
SangBin Cho	c74c913bfb	[misc] remove comments that were supposed to be removed (#4977 )	2024-05-22 09:02:58 -04:00
SangBin Cho	2e9a2227ec	[Lora] Support long context lora (#4787 ) Currently we need to call rotary embedding kernel for each LoRA, which makes it hard to serve multiple long context length LoRA. Add batched rotary embedding kernel and pipe it through. It replaces the rotary embedding layer to the one that is aware of multiple cos-sin-cache per scaling factors. Follow up of https://github.com/vllm-project/vllm/pull/3095/files	2024-05-18 16:05:23 +09:00
Antoni Baum	ad932a221d	[Core] Faster startup for LoRA enabled models (#4634 )	2024-05-08 10:33:18 -07:00
Austin Veselka	10760da800	[Bugfix] Fixed error in slice_lora_b for MergedQKVParallelLinearWithLora (#4609 )	2024-05-07 10:59:07 -07:00
Austin Veselka	eefeb16464	[Kernel] Full Tensor Parallelism for LoRA Layers (#3524 ) Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>	2024-04-27 00:03:48 -07:00
Cody Yu	a62aaf1df5	[Misc][Refactor] Generalize linear_method to be quant_method (#4373 )	2024-04-26 16:41:14 -04:00
SangBin Cho	a88081bf76	[CI] Disable non-lazy string operation on logging (#4326 ) Co-authored-by: Danny Guinther <dguinther@neuralmagic.com>	2024-04-26 00:16:58 -07:00
SangBin Cho	b5b4a398a7	[Mypy] Typing lora folder (#4337 )	2024-04-25 19:13:50 +00:00
SangBin Cho	0ae11f78ab	[Mypy] Part 3 fix typing for nested directories for most of directory (#4161 )	2024-04-22 21:32:44 -07:00
Jee Li	d17c8477f1	[Bugfix] Fix LoRA loading check (#4138 ) Co-authored-by: simon-mo <simon.mo@hey.com>	2024-04-19 00:59:54 -07:00
SangBin Cho	533d2a1f39	[Typing] Mypy typing part 2 (#4043 ) Co-authored-by: SangBin Cho <sangcho@sangcho-LT93GQWG9C.local>	2024-04-17 17:28:43 -07:00
Jee Li	b8aacac31a	[Bugfix] Fix LoRA bug (#4032 )	2024-04-12 16:56:37 -07:00
Jee Li	1096717ae9	[Core] Support LoRA on quantized models (#4012 )	2024-04-11 21:02:44 -07:00
Antoni Baum	1e96c3341a	Add extra punica sizes to support bigger vocabs (#4015 )	2024-04-11 22:18:57 +00:00
Antoni Baum	a10d3056da	[Core] Set `linear_weights` directly on the layer (#3977 )	2024-04-11 16:35:51 -04:00
SangBin Cho	67b4221a61	[Core][5/N] Fully working chunked prefill e2e (#3884 )	2024-04-10 17:56:48 -07:00
youkaichao	63e7176f26	[Core][Refactor] move parallel_utils into vllm/distributed (#3950 ) [WIP][Core][Refactor] move vllm/model_executor/parallel_utils into vllm/distributed and vllm/device_communicators (#3950)	2024-04-10 15:33:30 -07:00

1 2

64 Commits