c0de128 961a5ab423 [Bugfix][Hardware][AMD] Consolidate FP8 min/max values into helper function
Add get_fp8_min_max() helper in quant_utils.py to centralize the
FP8 min/max value logic for ROCm fnuz dtype handling.

On ROCm with torch.float8_e4m3fnuz, using PyTorch's default finfo.max
(240.0) causes accuracy issues with dynamic quantization. The correct
value is 224.0 for fnuz dtype.

This change:
- Adds get_fp8_min_max(dtype) helper returning (fp8_min, fp8_max) tuple
- Updates input_quant_fp8.py to use the helper
- Updates fp8_utils.py per_token_group_quant_fp8() to use the helper
- Updates deep_gemm.py per_block_cast_to_fp8() to use the helper
- Updates tests/kernels/quant_utils.py to use the helper

Fixes #30360

Signed-off-by: c0de128 <kevin.mckay@outlook.com>
2025-12-24 13:20:25 -06:00
..