Use --attention-backend flag instead of VLLM_ATTENTION_BACKEND env var

Per reviewer feedback, the VLLM_ATTENTION_BACKEND environment variable is being deprecated in favor of the --attention-backend CLI flag. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: yurekami <yurekami@users.noreply.github.com>
2026-07-16 20:47:23 +08:00 · 2025-12-24 15:16:10 +09:00 · 2025-12-24 15:16:10 +09:00 · 79e0db60ee
commit 79e0db60ee
parent 3c8358c328
1 changed files with 6 additions and 6 deletions
--- a/vllm/v1/worker/cp_utils.py
+++ b/vllm/v1/worker/cp_utils.py
@ -34,8 +34,8 @@ def check_attention_cp_compatibility(vllm_config: VllmConfig) -> None:
                    f"current attention backend "
                    f"'{layer_impl.__class__.__name__}'.\n\n"
                    f"To resolve this issue, try one of the following:\n"
-                    f"  1. Use a different attention backend by setting:\n"
-                    f"     export VLLM_ATTENTION_BACKEND=<backend>\n"
+                    f"  1. Use a different attention backend by specifying:\n"
+                    f"     --attention-backend <backend>\n"
                    f"  2. Set cp_kv_cache_interleave_size to 1\n"
                    f"  3. Disable speculative decoding"
                )
@ -48,8 +48,8 @@ def check_attention_cp_compatibility(vllm_config: VllmConfig) -> None:
                    f"'{layer_impl.__class__.__name__}' does not support this "
                    f"feature.\n\n"
                    f"To resolve this issue, try one of the following:\n"
-                    f"  1. Use a compatible attention backend by setting:\n"
-                    f"     export VLLM_ATTENTION_BACKEND=<backend>\n"
+                    f"  1. Use a compatible attention backend by specifying:\n"
+                    f"     --attention-backend <backend>\n"
                    f"     Compatible backends: FLASH_ATTN, FLASHINFER, "
                    f"TRITON_MLA, FLASH_MLA, FLASH_ATTN_MLA, CUTLASS_MLA\n"
                    f"  2. Disable DCP by removing the "
@ -66,8 +66,8 @@ def check_attention_cp_compatibility(vllm_config: VllmConfig) -> None:
                    f"'{layer_impl.__class__.__name__}' does not support this "
                    f"feature.\n\n"
                    f"To resolve this issue, try one of the following:\n"
-                    f"  1. Use a compatible attention backend by setting:\n"
-                    f"     export VLLM_ATTENTION_BACKEND=<backend>\n"
+                    f"  1. Use a compatible attention backend by specifying:\n"
+                    f"     --attention-backend <backend>\n"
                    f"  2. Disable PCP by removing the "
                    f"--prefill-context-parallel-size flag\n\n"
                    f"For more information, see:\n"