xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-13 22:45:50 +08:00

Author	SHA1	Message	Date
GiantCroc	c154d89306	[Doc] fix arg docstring in linear layers (#18410 ) Signed-off-by: giantcroc <1204449533@qq.com>	2025-05-21 06:45:57 -07:00
Andrzej Kotłowski	38fe728d60	[Bugfix] Fix QKVCrossParallelLinear::sync_weight_attrs for PyTorch compile (#17844 ) Signed-off-by: Andrzej Kotłowski <akotlowski@habana.ai>	2025-05-14 09:39:51 +00:00
Simon Mo	dcbac4cb4b	[Model] Qwen3 Dense FP8 Compat Fixes (#17318 ) Signed-off-by: simon-mo <xmo@berkeley.edu>	2025-04-28 14:12:01 -07:00
Lei Wang	8d32dc603d	[Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS (#6036 ) Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com> Co-authored-by: xinyuxiao <xinyuxiao2024@gmail.com>	2025-04-22 09:01:36 +01:00
Charlie Fu	188b7f9b8c	[Performance][ROCm] Add skinny gemms for unquantized linear on ROCm (#15830 ) Signed-off-by: charlifu <charlifu@amd.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>	2025-04-21 20:46:22 -07:00
Isotr0py	40b4284fe3	[Bugfix] Handle `process_weights_after_loading` for `QKVCrossParallelLinear` (#15328 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-08 10:02:23 -07:00
Pavani Majety	debd6bbf09	[Kernel] Add ModelOpt FP4 Checkpoint Support (#12520 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2025-03-12 05:13:11 +00:00
Isotr0py	e392d85831	[Core] Refactor `QKVCrossParallelLinear` implementation to support BNB 4-bit quantization (#14545 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-03-11 20:12:52 -07:00
Nicolò Lucchesi	69ff99fdcd	[Core] Optimizing cross-attention `QKVParallelLinear` computation (#12325 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: NickLucche <nick@nlucches-4xa100.c.openshift-330514.internal> Co-authored-by: NickLucche <nick@nlucches-4xa100.c.openshift-330514.internal>	2025-03-06 09:37:26 +00:00
Isotr0py	e17e4488bd	[LoRA] Remove linear hack outside transformers backend (#14177 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-03-05 15:06:28 +00:00
Szymon Ożóg	7f0be2aa24	[Model] Deepseek GGUF support (#13167 )	2025-02-27 02:08:35 -08:00
Michael Goin	09972e716c	[Bugfix] Allow fallback to AWQ from AWQMarlin at per-layer granularity (#13119 )	2025-02-12 09:19:53 -08:00
Szymon Ożóg	2b25b7d2e1	Fix initializing GGUF weights for ColumnParallelLinear when using tensor parallel > 1 (#13023 )	2025-02-11 08:38:48 -08:00
Harry Mellor	249824c3bf	Refactor `Linear` handling in `TransformersModel` (#12727 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-02-05 04:31:12 +00:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
Martin Gleize	bbe5f9de7d	[Model] Support for fairseq2 Llama (#11442 ) Signed-off-by: Martin Gleize <mgleize@meta.com> Co-authored-by: mgleize user <mgleize@a100-st-p4de24xlarge-4.fair-a100.hpcaas>	2025-01-19 10:40:40 -08:00
kewang-xlnx	de0526f668	[Misc][Quark] Upstream Quark format to VLLM (#10765 ) Signed-off-by: kewang-xlnx <kewang@xilinx.com> Signed-off-by: kewang2 <kewang2@amd.com> Co-authored-by: kewang2 <kewang2@amd.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2025-01-15 11:05:15 -05:00
Isotr0py	d14e98d924	[Model] Support GGUF models newly added in `transformers` 4.46.0 (#9685 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-01-13 00:13:44 +00:00
Lucas Tucker	9c749713f6	[mypy] Forward pass function type hints in lora (#11740 ) Signed-off-by: lucast2021 <lucast2021@headroyce.org> Co-authored-by: lucast2021 <lucast2021@headroyce.org>	2025-01-06 07:59:36 +00:00
Michael Goin	2072924d14	[Model] [Quantization] Support deepseek_v3 w8a8 fp8 block-wise quantization (#11523 ) Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: simon-mo <xmo@berkeley.edu> Co-authored-by: simon-mo <simon.mo@hey.com> Co-authored-by: simon-mo <xmo@berkeley.edu> Co-authored-by: HandH1998 <1335248067@qq.com>	2024-12-26 15:33:30 -08:00
Isotr0py	b6374e09b0	[Bugfix] Fix Phi-3 BNB quantization with tensor parallel (#9948 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2024-11-22 15:01:56 +08:00
ElizaWszola	b00b33d77e	[Model][Quantization] HQQ support through Marlin kernel expansion (#9766 ) Signed-off-by: ElizaWszola <eliza@neuralmagic.com>	2024-11-19 13:31:12 -08:00
Jee Jee Li	7eb719df13	[Bugfix]Fix Phi-3 BNB online quantization (#10417 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2024-11-19 03:21:42 +00:00
Yan Ma	6b2d25efc7	[Hardware][XPU] AWQ/GPTQ support for xpu backend (#10107 ) Signed-off-by: yan ma <yan.ma@intel.com>	2024-11-18 11:18:05 -07:00
Li, Jiang	ca77dd7a44	[Hardware][CPU] Support AWQ for CPU backend (#7515 )	2024-10-09 10:28:08 -06:00
chenqianfzh	2f4117c38e	support bitsandbytes quantization with more models (#9148 )	2024-10-08 19:52:19 -06:00
Isotr0py	f19da64871	[Core] Refactor GGUF parameters packing and forwarding (#8859 )	2024-10-07 10:01:46 +00:00
chenqianfzh	9855b99502	[Feature][kernel] tensor parallelism with bitsandbytes quantization (#8434 )	2024-09-17 08:09:12 -07:00
Pavani Majety	efcf946a15	[Hardware][NV] Add support for ModelOpt static scaling checkpoints. (#6112 )	2024-09-11 00:38:40 -04:00
Dipika Sikka	e16fa99a6a	[Misc] Update fbgemmfp8 to use `vLLMParameters` (#7972 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-09-03 20:12:41 -06:00
Dipika Sikka	2188a60c7e	[Misc] Update `GPTQ` to use `vLLMParameters` (#7976 )	2024-09-03 17:21:44 -04:00
chenqianfzh	4664ceaad6	support bitsandbytes 8-bit and FP4 quantized models (#7445 )	2024-08-29 19:09:08 -04:00
Dipika Sikka	86a677de42	[misc] update tpu int8 to use new vLLM Parameters (#7973 )	2024-08-29 16:46:55 -04:00
Dipika Sikka	015e6cc252	[Misc] Update compressed tensors lifecycle to remove `prefix` from `create_weights` (#7825 )	2024-08-26 18:09:34 -06:00
Dipika Sikka	dd9857f5fa	[Misc] Update `gptq_marlin_24` to use vLLMParameters (#7762 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-08-26 17:44:54 -04:00
Dipika Sikka	665304092d	[Misc] Update `qqq` to use vLLMParameters (#7805 )	2024-08-26 13:16:15 -06:00
Dipika Sikka	f1df5dbfd6	[Misc] Update `marlin` to use vLLMParameters (#7803 )	2024-08-23 14:30:52 -04:00
Dipika Sikka	955b5191c9	[Misc] update fp8 to use `vLLMParameter` (#7437 )	2024-08-22 08:36:18 -04:00
Isotr0py	12e1c65bc9	[Model] Add AWQ quantization support for InternVL2 model (#7187 )	2024-08-20 23:18:57 -07:00
Isotr0py	7601cb044d	[Core] Support tensor parallelism for GGUF quantization (#7520 )	2024-08-19 17:30:14 -04:00
Dipika Sikka	b1e5afc3e7	[Misc] Update `awq` and `awq_marlin` to use `vLLMParameters` (#7422 )	2024-08-13 17:08:20 -04:00
Dipika Sikka	fb377d7e74	[Misc] Update `gptq_marlin` to use new vLLMParameters (#7281 )	2024-08-13 14:30:11 -04:00
Dipika Sikka	5c6c54d67a	[Bugfix] Fix `PerTensorScaleParameter` weight loading for fused models (#7376 )	2024-08-09 21:23:46 +00:00
Dipika Sikka	0f7052bc7e	[Misc] Refactor linear layer weight loading; introduce `BasevLLMParameter` and `weight_loader_v2` (#5874 )	2024-08-07 09:17:58 -07:00
Isotr0py	360bd67cf0	[Core] Support loading GGUF model (#5191 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-08-05 17:54:23 -06:00
QQSong	062a1d0fab	Fix ReplicatedLinear weight loading (#6793 )	2024-07-25 19:24:58 -07:00
Robert Shaw	683e3cb9c4	[ Misc ] `fbgemm` checkpoints (#6559 )	2024-07-20 09:36:57 -07:00
Thomas Parnell	a5314e8698	[Model] RowParallelLinear: pass bias to quant_method.apply (#6327 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2024-07-19 07:15:22 -06:00
Robert Shaw	dbe5588554	[ Misc ] non-uniform quantization via `compressed-tensors` for `Llama` (#6515 )	2024-07-18 22:39:18 -04:00
Michael Goin	978aed5300	[Kernel][Attention] Separate `Attention.kv_scale` into `k_scale` and `v_scale` (#6081 )	2024-07-16 15:31:32 -07:00

1 2

77 Commits