xinyun/vllm - vllm - 丝路新云-代码仓

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-04-06 11:37:03 +08:00

Author	SHA1	Message	Date
vllmellm	f5e6cd9695	prefer QuantKey over ScaledMMLinearQuantStrategy Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-11-04 12:11:13 +00:00
vllmellm	dd5a70ec71	update unit tests to use ScaledMMLinearKernels Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-11-01 16:28:03 +00:00
vllmellm	e845035f4c	bug fix Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-10-31 16:38:26 +00:00
vllmellm	5fbe76bc0a	format; update fbgemm path Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-10-31 15:08:19 +00:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Luka Govedič	31d5c1797f	[Perf][fp8] Use CustomOp abstraction for fp8 quant for better perf (#19830 ) Signed-off-by: Luka Govedic <lgovedic@redhat.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-07-11 04:56:28 +00:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Harry Mellor	6223dd8114	Update deprecated type hinting in `model_executor/layers` (#18056 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-13 04:17:23 -07:00
Michael Goin	f065de4e88	Fix FBGEMM integration (#18002 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-12 23:02:07 +00:00
Harry Mellor	13698db634	Improve configs - `ModelConfig` (#17130 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-04-30 10:38:22 +08:00
Gregory Shtrasberg	4d0ec37267	[Quantization][FP8] Adding support for fp8 gemm layer input in fp8 (#14578 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-03-28 02:58:16 +00:00
Jeff Daily	a1c8f3796c	dynamic distpatch of fp8 kernels (#14245 ) Signed-off-by: Jeff Daily <jeff.daily@amd.com>	2025-03-11 10:54:56 -04:00
Luka Govedič	e1744502c2	[FP8] Refactor apply_fp8_linear and apply_fp8_linear_generic into an object (#14390 ) Signed-off-by: luka <luka@neuralmagic.com>	2025-03-07 05:20:16 +00:00
Tyler Michael Smith	b3942e157e	[Bugfix][CI][V1] Work around V1 + CUDA Graph + torch._scaled_mm fallback issue (#13425 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-02-18 00:32:48 +00:00
Kyle Sayers	12913d17ba	[Quant] Add `SupportsQuant` to phi3 and clip (#13104 )	2025-02-15 19:28:33 -08:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
Michael Goin	399c798608	Remove ScaledActivation for AWQ (#10057 ) Signed-off-by: mgoin <michael@neuralmagic.com>	2024-11-06 14:27:06 +00:00
wangshuai09	4e2d95e372	[Hardware][ROCM] using current_platform.is_rocm (#9642 ) Signed-off-by: wangshuai09 <391746016@qq.com>	2024-10-28 04:07:00 +00:00
Gregory Shtrasberg	b3195bc9e4	[AMD][ROCm]Quantization methods on ROCm; Fix _scaled_mm call (#8380 ) Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com> Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-09-18 10:41:08 -07:00
Cyrus Leung	6ffa3f314c	[CI/Build] Avoid CUDA initialization (#8534 )	2024-09-18 10:38:11 +00:00
Dipika Sikka	e16fa99a6a	[Misc] Update fbgemmfp8 to use `vLLMParameters` (#7972 ) Co-authored-by: Michael Goin <michael@neuralmagic.com>	2024-09-03 20:12:41 -06:00
Elsa Granger	3eeb148f46	[Misc] Pass cutlass_fp8_supported correctly in fbgemm_fp8 (#6871 )	2024-07-28 11:13:49 -04:00
Michael Goin	0eb0757bef	[Misc] Add ignored layers for `fp8` quantization (#6657 )	2024-07-23 14:04:04 -04:00
Cheng Li	c5e8330997	[Bugfix] Fix null `modules_to_not_convert` in FBGEMM Fp8 quantization (#6665 )	2024-07-22 19:25:05 -07:00
Robert Shaw	9364f74eee	[ Kernel ] Enable `fp8-marlin` for `fbgemm-fp8` models (#6606 )	2024-07-20 18:50:10 +00:00
Robert Shaw	683e3cb9c4	[ Misc ] `fbgemm` checkpoints (#6559 )	2024-07-20 09:36:57 -07:00

27 Commits