Remove VLLM_SKIP_WARMUP tip (#29331)

Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
2026-03-16 15:37:13 +08:00 · 2025-11-24 17:16:05 -05:00 · 2025-11-24 17:16:05 -05:00 · 4dd42db566
commit 4dd42db566
parent 84371daf75
1 changed files with 0 additions and 3 deletions
--- a/docs/features/quantization/inc.md
+++ b/docs/features/quantization/inc.md
@ -22,9 +22,6 @@ export QUANT_CONFIG=/path/to/quant/config/inc/meta-llama-3.1-405b-instruct/maxab
 vllm serve meta-llama/Llama-3.1-405B-Instruct --quantization inc --kv-cache-dtype fp8_inc --tensor_paralel_size 8
 ```

-!!! tip
-    If you are just prototyping or testing your model with FP8, you can use the `VLLM_SKIP_WARMUP=true` environment variable to disable the warmup stage, which can take a long time. However, we do not recommend disabling this feature in production environments as it causes a significant performance drop.
-
 !!! tip
    When using FP8 models, you may experience timeouts caused by the long compilation time of FP8 operations. To mitigate this problem, you can use the below environment variables:
    `VLLM_ENGINE_ITERATION_TIMEOUT_S` - to adjust the vLLM server timeout. You can set the value in seconds, e.g., 600 equals 10 minutes.