[FIXBUG] Qwen3VL hallucinations without Contiguous on Torch.SDPA (#27744)

Signed-off-by: JartX <sagformas@epdcenter.es> Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com>
2025-12-15 13:35:48 +08:00 · 2025-10-29 17:55:35 +01:00 · 2025-10-29 17:55:35 +01:00 · 7568a282b9
commit 7568a282b9
parent 1da3309ace
1 changed files with 8 additions and 0 deletions
--- a/vllm/model_executor/models/qwen2_5_vl.py
+++ b/vllm/model_executor/models/qwen2_5_vl.py
@ -428,6 +428,14 @@ class Qwen2_5_VisionAttention(nn.Module):
            )
        elif self.attn_backend == _Backend.TORCH_SDPA:
            # Execute attention entry by entry for speed & less VRAM.
            from vllm.platforms import current_platform
            # Never remove the next contiguous logic
            # Without it, hallucinations occur with the backend
            if current_platform.is_rocm():
                q = q.contiguous()
                k = k.contiguous()
                v = v.contiguous()
            outputs = []
            for i in range(1, len(cu_seqlens)):
                start_idx = cu_seqlens[i - 1]