Add get_fp8_min_max() helper in quant_utils.py to centralize the
FP8 min/max value logic for ROCm fnuz dtype handling.
On ROCm with torch.float8_e4m3fnuz, using PyTorch's default finfo.max
(240.0) causes accuracy issues with dynamic quantization. The correct
value is 224.0 for fnuz dtype.
This change:
- Adds get_fp8_min_max(dtype) helper returning (fp8_min, fp8_max) tuple
- Updates input_quant_fp8.py to use the helper
- Updates fp8_utils.py per_token_group_quant_fp8() to use the helper
- Updates deep_gemm.py per_block_cast_to_fp8() to use the helper
- Updates tests/kernels/quant_utils.py to use the helper
Fixes#30360
Signed-off-by: c0de128 <kevin.mckay@outlook.com>