xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-01-29 14:37:15 +08:00

Author	SHA1	Message	Date
Zhuohan Li	d29483b58a	[Minor] Remove unnecessary error message (#27115 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>	2025-10-17 20:02:12 +00:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Harry Mellor	7c12763b24	Fix some typing issues found by `mypy==1.18.2` (#26596 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-10 18:21:25 +00:00
Isotr0py	d1ddf340c8	[V0 deprecation] Remove `QKVCrossParallelLinear` implementation (#26475 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-10-09 10:52:27 +00:00
Harry Mellor	4e256cadc2	Remove all references to `yapf` as it's no longer used (#26251 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 09:18:11 -07:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Aleksandr Malyshev	53a30845be	Llamas 3.1 405B fp4 changes upstreaming from 355_wip (#25135 ) Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Doug Lehr <douglehr@amd.com>	2025-09-25 19:16:53 -06:00
Kyle Sayers	de94289a98	[Core] Support weight_loader_v2 for `UnquantizedLinearMethod` (#23036 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>	2025-09-23 18:30:26 -06:00
Michael Goin	fbd6523ac0	Refactor dense FP8 tensor/channel/block utils and add CT FP8 block (#21404 )	2025-09-18 08:53:45 -04:00
Rafael Marcelino Koike	b834b4cbf1	[USAGE] Improve error handling for weight initialization in Unquantized… (#20321 ) Signed-off-by: Rafael Marcelino Koike <rafael.koike@oracle.com> Signed-off-by: Rafael Koike <koike.rafael@gmail.com>	2025-09-15 16:45:49 +00:00
Didier Durand	41ae4a1eab	[Doc]: fix typos in various files (#24798 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-09-13 00:43:33 -07:00
Isotr0py	00a4e56d8d	[Bugfix] Fix broken deepseek fp8 TP weights loading (#24367 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-09-06 09:23:12 -07:00
Isotr0py	53b19ccdd5	[Core] Allow disabling TP sharding for parallel Linear layer (#23024 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-09-05 22:53:58 -07:00
Li, Jiang	57b1ce94f7	[CPU] Refactor CPU unquantized linear (#24150 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-09-04 14:28:45 +08:00
Kyuyeun Kim	9480ae24e3	[Bugfix] Fix packed_factor missing attribute error (#23902 ) Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>	2025-09-02 10:56:31 -07:00
Kyle Sayers	22feac8e95	[Transform] [Quantization] Add transforms to compressed tensors (#22486 )	2025-08-28 02:43:48 -04:00
Hyogeun Oh (오효근)	730d0ac8b9	[Docs] Fix warnings in `mkdocs build` (#23649 ) Signed-off-by: Zerohertz <ohg3417@gmail.com> Signed-off-by: Hyogeun Oh (오효근) <ohg3417@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-26 18:19:23 +00:00
Jee Jee Li	170e8ea9ea	[Misc] Unified linear print info (#23516 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-24 20:13:51 -07:00
Daifeng Li	fa78de9dc3	Quantization: support FP4 quantized models on AMD CDNA2/CDNA3 GPUs (#22527 ) Signed-off-by: feng <fengli1702@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-08-22 20:53:21 -06:00
Li, Jiang	7be5d113d8	[CPU] Refactor CPU W8A8 scaled_mm (#23071 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-08-21 09:34:24 +08:00
Michael Goin	0cdbf5e61c	[Kernel/Quant] Remove the original marlin format and qqq (#23204 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-20 15:13:36 -04:00
TJian	1298c67795	[FEAT] [Performance] Enable DP for ViT in Qwen2.5VL (#22742 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-08-19 15:25:57 +00:00
Michael Goin	4fc722eca4	[Kernel/Quant] Remove AQLM (#22943 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-08-16 19:38:21 +00:00
wangxiyuan	0b1bdac6af	[Platform] Custom ops support for FusedMoe (#22509 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-08-13 04:12:00 -07:00
Mickaël Seznec	4fb56914c5	[perf] Add fused MLA QKV + strided layernorm (#21116 ) Signed-off-by: Mickael Seznec <mickael@mistral.ai> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-22 07:07:44 -07:00
Kevin_Xiong	c9ba8104ed	[Bugfix] weight loading use correct tp_group with patch_tensor_parallel_group (#21024 ) Signed-off-by: KevinXiong-C <kevin_xiong1997@outlook.com>	2025-07-16 19:36:36 -07:00
Li, Jiang	6cc1e7d96d	[CPU] Update custom ops for the CPU backend (#20255 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-07-01 07:25:03 +00:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Isotr0py	1f1b1bc03b	[V1][Quantization] Add CUDA graph compatible v1 GGUF support (#18646 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-27 04:40:28 +00:00
GiantCroc	c154d89306	[Doc] fix arg docstring in linear layers (#18410 ) Signed-off-by: giantcroc <1204449533@qq.com>	2025-05-21 06:45:57 -07:00
Andrzej Kotłowski	38fe728d60	[Bugfix] Fix QKVCrossParallelLinear::sync_weight_attrs for PyTorch compile (#17844 ) Signed-off-by: Andrzej Kotłowski <akotlowski@habana.ai>	2025-05-14 09:39:51 +00:00
Simon Mo	dcbac4cb4b	[Model] Qwen3 Dense FP8 Compat Fixes (#17318 ) Signed-off-by: simon-mo <xmo@berkeley.edu>	2025-04-28 14:12:01 -07:00
Lei Wang	8d32dc603d	[Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS (#6036 ) Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com> Co-authored-by: xinyuxiao <xinyuxiao2024@gmail.com>	2025-04-22 09:01:36 +01:00
Charlie Fu	188b7f9b8c	[Performance][ROCm] Add skinny gemms for unquantized linear on ROCm (#15830 ) Signed-off-by: charlifu <charlifu@amd.com> Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>	2025-04-21 20:46:22 -07:00
Isotr0py	40b4284fe3	[Bugfix] Handle `process_weights_after_loading` for `QKVCrossParallelLinear` (#15328 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-04-08 10:02:23 -07:00
Pavani Majety	debd6bbf09	[Kernel] Add ModelOpt FP4 Checkpoint Support (#12520 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2025-03-12 05:13:11 +00:00
Isotr0py	e392d85831	[Core] Refactor `QKVCrossParallelLinear` implementation to support BNB 4-bit quantization (#14545 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-03-11 20:12:52 -07:00
Nicolò Lucchesi	69ff99fdcd	[Core] Optimizing cross-attention `QKVParallelLinear` computation (#12325 ) Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: NickLucche <nick@nlucches-4xa100.c.openshift-330514.internal> Co-authored-by: NickLucche <nick@nlucches-4xa100.c.openshift-330514.internal>	2025-03-06 09:37:26 +00:00
Isotr0py	e17e4488bd	[LoRA] Remove linear hack outside transformers backend (#14177 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-03-05 15:06:28 +00:00
Szymon Ożóg	7f0be2aa24	[Model] Deepseek GGUF support (#13167 )	2025-02-27 02:08:35 -08:00
Michael Goin	09972e716c	[Bugfix] Allow fallback to AWQ from AWQMarlin at per-layer granularity (#13119 )	2025-02-12 09:19:53 -08:00
Szymon Ożóg	2b25b7d2e1	Fix initializing GGUF weights for ColumnParallelLinear when using tensor parallel > 1 (#13023 )	2025-02-11 08:38:48 -08:00
Harry Mellor	249824c3bf	Refactor `Linear` handling in `TransformersModel` (#12727 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-02-05 04:31:12 +00:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
Martin Gleize	bbe5f9de7d	[Model] Support for fairseq2 Llama (#11442 ) Signed-off-by: Martin Gleize <mgleize@meta.com> Co-authored-by: mgleize user <mgleize@a100-st-p4de24xlarge-4.fair-a100.hpcaas>	2025-01-19 10:40:40 -08:00
kewang-xlnx	de0526f668	[Misc][Quark] Upstream Quark format to VLLM (#10765 ) Signed-off-by: kewang-xlnx <kewang@xilinx.com> Signed-off-by: kewang2 <kewang2@amd.com> Co-authored-by: kewang2 <kewang2@amd.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2025-01-15 11:05:15 -05:00
Isotr0py	d14e98d924	[Model] Support GGUF models newly added in `transformers` 4.46.0 (#9685 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-01-13 00:13:44 +00:00
Lucas Tucker	9c749713f6	[mypy] Forward pass function type hints in lora (#11740 ) Signed-off-by: lucast2021 <lucast2021@headroyce.org> Co-authored-by: lucast2021 <lucast2021@headroyce.org>	2025-01-06 07:59:36 +00:00
Michael Goin	2072924d14	[Model] [Quantization] Support deepseek_v3 w8a8 fp8 block-wise quantization (#11523 ) Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: simon-mo <xmo@berkeley.edu> Co-authored-by: simon-mo <simon.mo@hey.com> Co-authored-by: simon-mo <xmo@berkeley.edu> Co-authored-by: HandH1998 <1335248067@qq.com>	2024-12-26 15:33:30 -08:00
Isotr0py	b6374e09b0	[Bugfix] Fix Phi-3 BNB quantization with tensor parallel (#9948 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2024-11-22 15:01:56 +08:00

1 2 3

106 Commits