9 Commits

Author SHA1 Message Date
Ilya Markov
e13945f9dd
[Perf] Further tunings for SM100 FP8 CUTLASS kernel (#19566) 2025-06-14 17:25:10 -07:00
Michael Goin
53a5a0ce30
[Perf] Tunings for SM100 FP8 CUTLASS kernel (#18778)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-06-04 10:46:28 -07:00
Lain
5f2cd251d2
Sm100 blockwise fp8 swap ab (#18564) 2025-06-04 07:48:45 -07:00
Lain
e23564cb70
use ceil_div in cutlass block scaling shape check (#17918) 2025-05-16 03:02:58 -07:00
Shu Wang
376786fac1
Add cutlass support for blackwell fp8 blockwise gemm (#14383)
Signed-off-by: Shu Wang <shuw@nvidia.com>
2025-05-08 15:09:55 -07:00
kushanam
f89978ad7c
add cutlass support for blackwell fp8 gemm (#13798) 2025-03-04 07:55:07 -08:00
leoneo
839b27c6cc
[Kernel]Add streamK for block-quantized CUTLASS kernels (#12978) 2025-02-20 22:14:24 -08:00
Tyler Michael Smith
c1e37bf71b
[Kernel][Bugfix] Refactor and Fix CUTLASS 2:4 Sparse Kernels (#13198)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-02-14 00:01:14 +00:00
Lucas Wilkinson
9798b2fb00
[Kernel] Update cutlass_scaled_mm to support 2d group (blockwise) scaling (#11868) 2025-01-30 18:33:00 -08:00