27 Commits

Author SHA1 Message Date
vllmellm
f5e6cd9695 prefer QuantKey over ScaledMMLinearQuantStrategy
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-11-04 12:11:13 +00:00
vllmellm
dd5a70ec71 update unit tests to use ScaledMMLinearKernels
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-11-01 16:28:03 +00:00
vllmellm
e845035f4c bug fix
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-10-31 16:38:26 +00:00
vllmellm
5fbe76bc0a format; update fbgemm path
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
2025-10-31 15:08:19 +00:00
Harry Mellor
8fcaaf6a16
Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-12 09:51:31 -07:00
Harry Mellor
d6953beb91
Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-10-05 07:06:22 -07:00
Luka Govedič
31d5c1797f
[Perf][fp8] Use CustomOp abstraction for fp8 quant for better perf (#19830)
Signed-off-by: Luka Govedic <lgovedic@redhat.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-07-11 04:56:28 +00:00
Simon Mo
02f0c7b220
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-06-03 11:20:17 -07:00
Harry Mellor
6223dd8114
Update deprecated type hinting in model_executor/layers (#18056)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-05-13 04:17:23 -07:00
Michael Goin
f065de4e88
Fix FBGEMM integration (#18002)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-05-12 23:02:07 +00:00
Harry Mellor
13698db634
Improve configs - ModelConfig (#17130)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-04-30 10:38:22 +08:00
Gregory Shtrasberg
4d0ec37267
[Quantization][FP8] Adding support for fp8 gemm layer input in fp8 (#14578)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-03-28 02:58:16 +00:00
Jeff Daily
a1c8f3796c
dynamic distpatch of fp8 kernels (#14245)
Signed-off-by: Jeff Daily <jeff.daily@amd.com>
2025-03-11 10:54:56 -04:00
Luka Govedič
e1744502c2
[FP8] Refactor apply_fp8_linear and apply_fp8_linear_generic into an object (#14390)
Signed-off-by: luka <luka@neuralmagic.com>
2025-03-07 05:20:16 +00:00
Tyler Michael Smith
b3942e157e
[Bugfix][CI][V1] Work around V1 + CUDA Graph + torch._scaled_mm fallback issue (#13425)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-02-18 00:32:48 +00:00
Kyle Sayers
12913d17ba
[Quant] Add SupportsQuant to phi3 and clip (#13104) 2025-02-15 19:28:33 -08:00
Russell Bryant
e489ad7a21
[Misc] Add SPDX-License-Identifier headers to python source files (#12628)
- **Add SPDX license headers to python source files**
- **Check for SPDX headers using pre-commit**

commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745
Author: Russell Bryant <rbryant@redhat.com>
Date:   Fri Jan 31 14:18:24 2025 -0500

    Add SPDX license headers to python source files
    
This commit adds SPDX license headers to python source files as
recommended to
the project by the Linux Foundation. These headers provide a concise way
that is
both human and machine readable for communicating license information
for each
source file. It helps avoid any ambiguity about the license of the code
and can
    also be easily used by tools to help manage license compliance.
    
The Linux Foundation runs license scans against the codebase to help
ensure
    we are in compliance with the licenses of the code we use, including
dependencies. Having these headers in place helps that tool do its job.
    
    More information can be found on the SPDX site:
    
    - https://spdx.dev/learn/handling-license-info/
    
    Signed-off-by: Russell Bryant <rbryant@redhat.com>

commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea
Author: Russell Bryant <rbryant@redhat.com>
Date:   Fri Jan 31 14:36:32 2025 -0500

    Check for SPDX headers using pre-commit
    
    Signed-off-by: Russell Bryant <rbryant@redhat.com>

---------

Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-02-02 11:58:18 -08:00
Michael Goin
399c798608
Remove ScaledActivation for AWQ (#10057)
Signed-off-by: mgoin <michael@neuralmagic.com>
2024-11-06 14:27:06 +00:00
wangshuai09
4e2d95e372
[Hardware][ROCM] using current_platform.is_rocm (#9642)
Signed-off-by: wangshuai09 <391746016@qq.com>
2024-10-28 04:07:00 +00:00
Gregory Shtrasberg
b3195bc9e4
[AMD][ROCm]Quantization methods on ROCm; Fix _scaled_mm call (#8380)
Co-authored-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-09-18 10:41:08 -07:00
Cyrus Leung
6ffa3f314c
[CI/Build] Avoid CUDA initialization (#8534) 2024-09-18 10:38:11 +00:00
Dipika Sikka
e16fa99a6a
[Misc] Update fbgemmfp8 to use vLLMParameters (#7972)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-09-03 20:12:41 -06:00
Elsa Granger
3eeb148f46
[Misc] Pass cutlass_fp8_supported correctly in fbgemm_fp8 (#6871) 2024-07-28 11:13:49 -04:00
Michael Goin
0eb0757bef
[Misc] Add ignored layers for fp8 quantization (#6657) 2024-07-23 14:04:04 -04:00
Cheng Li
c5e8330997
[Bugfix] Fix null modules_to_not_convert in FBGEMM Fp8 quantization (#6665) 2024-07-22 19:25:05 -07:00
Robert Shaw
9364f74eee
[ Kernel ] Enable fp8-marlin for fbgemm-fp8 models (#6606) 2024-07-20 18:50:10 +00:00
Robert Shaw
683e3cb9c4
[ Misc ] fbgemm checkpoints (#6559) 2024-07-20 09:36:57 -07:00