6 Commits

Author SHA1 Message Date
Jhao-Ting Chen
5a5506c661 enable DeepGEMM swapAB from FlashInfer for M<32 linear gemms
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-12-24 11:19:39 -08:00
Kate Cheng
3d429d63a6 Enable linear deepgemm_swapAB
Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>
2025-12-24 11:19:39 -08:00
Michael Goin
f9a4087182
Remove weight_scale.T special case for SM90 Block FP8 CUTLASS kernel (#28431)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-11-11 11:46:04 -05:00
Michael Goin
c3aea10dc8
[Perf] Use upstream CUTLASS for SM90 Block FP8 kernel (#23280)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-11 15:43:14 -07:00
Michael Goin
b7adf94c4a
Tuned H100/H200 triton fp8 block configs for fused_qkv_a_proj (#23939)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-29 10:28:35 -07:00
Michael Goin
a781e84ec2
[Perf] Tune configs for triton block fp8 gemm H100/H200 (#23748)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-28 11:12:53 +08:00