[BugFix] Skip the Q component for QKVParallelLinear in the case of QKVCrossParallelLinear since its width is 0 (#22369)

Signed-off-by: sstamenk <sstamenk@amd.com>
2026-01-06 17:44:01 +08:00 · 2025-08-15 19:17:31 +02:00 · 2025-08-15 19:17:31 +02:00 · 6b04039a72
commit 6b04039a72
parent 1c859a1387
1 changed files with 3 additions and 0 deletions
--- a/vllm/model_executor/layers/quantization/utils/w8a8_utils.py
+++ b/vllm/model_executor/layers/quantization/utils/w8a8_utils.py
@ -121,6 +121,9 @@ def requantize_with_max_scale(
    if unfused_module_in_checkpoint:
        start = 0
        for idx, logical_width in enumerate(logical_widths):
+            # Skip any component with zero width.
+            if logical_width == 0:
+                continue
            end = start + logical_width
            weight_dq = per_tensor_dequantize(weight[start:end, :],
                                              weight_scale[idx])