11 Commits

Author SHA1 Message Date
elvischenv
adc3ddb430
[Bugfix][Misc] Fix silu_and_mul_nvfp4_quant issue and extract common utils for nvfp4 kernel source files (#23727)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-09-04 14:25:45 -07:00
elvischenv
16a45b3a28
[NVIDIA] Support SiluMul + NVFP4 quant fusion (#23671)
Signed-off-by: jindih <jindih@nvidia.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: jindih <jindih@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedic <lgovedic@redhat.com>
2025-08-28 19:36:50 +00:00
bnellnm
f9c069c85e
Modularize fused experts and integrate PPLX kernels (#15956) 2025-05-14 13:11:54 -07:00
Jeff Daily
a1c8f3796c
dynamic distpatch of fp8 kernels (#14245)
Signed-off-by: Jeff Daily <jeff.daily@amd.com>
2025-03-11 10:54:56 -04:00
Luka Govedič
30870b4f66
[torch.compile] Dynamic fp8 + rms_norm fusion (#10906)
Signed-off-by: luka <luka@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
2024-12-13 03:19:23 +00:00
bnellnm
5467ac3196
[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047) 2024-06-09 16:23:30 -04:00
Michael Goin
5f6d10c14c
[CI/Build] Enforce style for C++ and CUDA code with clang-format (#4722) 2024-05-22 07:18:41 +00:00
wangding zeng
5d60def02c
DeepseekMoE support with Fused MoE kernel (#2453)
Co-authored-by: roy <jasonailu87@gmail.com>
2024-01-29 21:19:48 -08:00
zhaoyang-star
9090bf02e7
Support FP8-E5M2 KV Cache (#2279)
Co-authored-by: zhaoyang <zhao.yang16@zte.com.cn>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
2024-01-28 16:43:54 -08:00
Mingcan Xiang
614856da25
Avoid multiple redefinition (#1817) 2023-12-14 09:35:58 -08:00
Woosuk Kwon
8ce9c50d40
Avoid compiling kernels for double data type (#933) 2023-09-02 14:59:47 +09:00