Michael Goin
|
5d5146eee3
|
[CI/Build] Conditionally register cutlass_fp4_group_mm to fix building on Hopper (#26138)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-02 20:32:38 -07:00 |
|
Wentao Ye
|
711e912946
|
[Compile] Fix Compile Warning for Ignoring MIN_BLOCK_PER_SM (#25193)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-09-19 16:23:19 -06:00 |
|
elvischenv
|
e6585ddb45
|
[Bugfix] Fix accuracy issue for silu_mul + nvfp4 quant fusion kernel (#24833)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-17 16:37:23 -07:00 |
|
elvischenv
|
adc3ddb430
|
[Bugfix][Misc] Fix silu_and_mul_nvfp4_quant issue and extract common utils for nvfp4 kernel source files (#23727)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-04 14:25:45 -07:00 |
|
elvischenv
|
16a45b3a28
|
[NVIDIA] Support SiluMul + NVFP4 quant fusion (#23671)
Signed-off-by: jindih <jindih@nvidia.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: jindih <jindih@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedic <lgovedic@redhat.com>
|
2025-08-28 19:36:50 +00:00 |
|
Roberto L. Castro
|
789562c28c
|
Support CUTLASS NVFP4 (w4a4) for Blackwell Geforce GPUs (SM120) (#21309)
Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es>
|
2025-08-03 00:54:22 -07:00 |
|
Michael Goin
|
d47661f0cd
|
[Kernel] Basic tuned configs for NVFP4 CUTLASS dense GEMM (#20646)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-11 10:05:33 -06:00 |
|
Tyler Michael Smith
|
3be8d312a2
|
[Kernel][Bugfix] Fixup some warnings in nvfp4_blockwise_moe when CUDA < 12.8 (#20324)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-07-01 18:05:47 -07:00 |
|
Tyler Michael Smith
|
e8c3bd2cd1
|
[Bugfix] Fix some narrowing conversion warnings (#20141)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-06-27 09:01:28 -07:00 |
|
jiahanc
|
294fc1e2c9
|
[Hardware][NVIDIA][kernel] Fp4 MOE quant kernel optimization (#19500)
|
2025-06-14 09:34:28 -07:00 |
|
Pavani Majety
|
0c0fdae84f
|
[Hardware/NVIDIA/Kernel] Enable nvidia/DeepSeek-R1-FP4 Model (#16362)
|
2025-05-09 16:24:41 -07:00 |
|
Kaixi Hou
|
ed7a29d9f8
|
[NVIDIA] Support Cutlass MLA for Blackwell GPUs (#16032)
Signed-off-by: kaixih <kaixih@nvidia.com>
|
2025-04-27 06:29:21 -07:00 |
|
Pavani Majety
|
debd6bbf09
|
[Kernel] Add ModelOpt FP4 Checkpoint Support (#12520)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-03-12 05:13:11 +00:00 |
|
Roger Wang
|
82e0d601fc
|
[CI/Build] Fix pre-commit errors from #13571 (#13709)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-02-22 16:50:38 -08:00 |
|
Kaixi Hou
|
e109e598c7
|
[NVIDIA] Support nvfp4 cutlass gemm (#13571)
|
2025-02-22 05:24:05 -08:00 |
|
Kaixi Hou
|
27a09dc52c
|
[NVIDIA] Fix an issue to use current stream for the nvfp4 quant (#13632)
|
2025-02-20 22:01:48 -08:00 |
|
Kaixi Hou
|
4fc5c23bb6
|
[NVIDIA] Support nvfp4 quantization (#12784)
|
2025-02-12 19:51:51 -08:00 |
|