From b92ab3deda6aca2c0f05aba841f5edb05578af94 Mon Sep 17 00:00:00 2001 From: HDCharles <39544797+HDCharles@users.noreply.github.com> Date: Tue, 14 Oct 2025 16:39:59 -0400 Subject: [PATCH] Notice for deprecation of AutoAWQ (#26820) Signed-off-by: HDCharles <39544797+HDCharles@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --- docs/features/quantization/auto_awq.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/features/quantization/auto_awq.md b/docs/features/quantization/auto_awq.md index fc998387d29aa..182127bc91cc8 100644 --- a/docs/features/quantization/auto_awq.md +++ b/docs/features/quantization/auto_awq.md @@ -1,5 +1,9 @@ # AutoAWQ +> ⚠️ **Warning:** + The `AutoAWQ` library is deprecated. This functionality has been adopted by the vLLM project in [`llm-compressor`](https://github.com/vllm-project/llm-compressor/tree/main/examples/awq). + For the recommended quantization workflow, please see the AWQ examples in [`llm-compressor`](https://github.com/vllm-project/llm-compressor/tree/main/examples/awq). For more details on the deprecation, refer to the original [AutoAWQ repository](https://github.com/casper-hansen/AutoAWQ). + To create a new 4-bit quantized model, you can leverage [AutoAWQ](https://github.com/casper-hansen/AutoAWQ). Quantization reduces the model's precision from BF16/FP16 to INT4 which effectively reduces the total model memory footprint. The main benefits are lower latency and memory usage.