[Doc] Update quantization supported hardware table (#7595)

2026-07-23 02:57:10 +08:00 · 2024-08-16 16:59:27 -04:00 · 2024-08-16 16:59:27 -04:00 · d4f0f17b02
commit d4f0f17b02
parent b3f4e17935
1 changed files with 127 additions and 14 deletions
--- a/docs/source/quantization/supported_hardware.rst
+++ b/docs/source/quantization/supported_hardware.rst
@ -5,25 +5,138 @@ Supported Hardware for Quantization Kernels

 The table below shows the compatibility of various quantization implementations with different hardware platforms in vLLM:

-==============  ======  =======  =======  =====  ======  =======  =========  =======  ==============  ==========
-Implementation  Volta   Turing   Ampere   Ada    Hopper  AMD GPU  Intel GPU  x86 CPU  AWS Inferentia  Google TPU
-==============  ======  =======  =======  =====  ======  =======  =========  =======  ==============  ==========
-AQLM            ✅      ✅       ✅       ✅     ✅      ❌        ❌         ❌       ❌              ❌
-AWQ             ❌      ✅       ✅       ✅     ✅      ❌        ❌         ❌       ❌              ❌
-DeepSpeedFP     ✅      ✅       ✅       ✅     ✅      ❌        ❌         ❌       ❌              ❌
-FP8             ❌      ❌       ✅       ✅     ✅      ❌        ❌         ❌       ❌              ❌
-Marlin          ❌      ❌       ✅       ✅     ✅      ❌        ❌         ❌       ❌              ❌
-GPTQ            ✅      ✅       ✅       ✅     ✅      ❌        ❌         ❌       ❌              ❌
-SqueezeLLM      ✅      ✅       ✅       ✅     ✅      ❌        ❌         ❌       ❌              ❌
-bitsandbytes    ✅      ✅       ✅       ✅     ✅      ❌        ❌         ❌       ❌              ❌
-==============  ======  =======  =======  =====  ======  =======  =========  =======  ==============  ==========
+.. list-table::
+   :header-rows: 1
+   :widths: 20 8 8 8 8 8 8 8 8 8 8
+
+   * - Implementation
+     - Volta
+     - Turing
+     - Ampere
+     - Ada
+     - Hopper
+     - AMD GPU
+     - Intel GPU
+     - x86 CPU
+     - AWS Inferentia
+     - Google TPU
+   * - AWQ
+     - ✗
+     - ✅︎
+     - ✅︎
+     - ✅︎
+     - ✅︎
+     - ✗
+     - ✗
+     - ✗
+     - ✗
+     - ✗
+   * - GPTQ
+     - ✅︎
+     - ✅︎
+     - ✅︎
+     - ✅︎
+     - ✅︎
+     - ✗
+     - ✗
+     - ✗
+     - ✗
+     - ✗
+   * - Marlin (GPTQ/AWQ/FP8)
+     - ✗
+     - ✗
+     - ✅︎
+     - ✅︎
+     - ✅︎
+     - ✗
+     - ✗
+     - ✗
+     - ✗
+     - ✗
+   * - INT8 (W8A8)
+     - ✗
+     - ✅︎
+     - ✅︎
+     - ✅︎
+     - ✅︎
+     - ✗
+     - ✗
+     - ✗
+     - ✗
+     - ✗
+   * - FP8 (W8A8)
+     - ✗
+     - ✗
+     - ✗
+     - ✅︎
+     - ✅︎
+     - ✅︎
+     - ✗
+     - ✗
+     - ✗
+     - ✗
+   * - AQLM
+     - ✅︎
+     - ✅︎
+     - ✅︎
+     - ✅︎
+     - ✅︎
+     - ✗
+     - ✗
+     - ✗
+     - ✗
+     - ✗
+   * - bitsandbytes
+     - ✅︎
+     - ✅︎
+     - ✅︎
+     - ✅︎
+     - ✅︎
+     - ✗
+     - ✗
+     - ✗
+     - ✗
+     - ✗
+   * - DeepSpeedFP
+     - ✅︎
+     - ✅︎
+     - ✅︎
+     - ✅︎
+     - ✅︎
+     - ✗
+     - ✗
+     - ✗
+     - ✗
+     - ✗
+   * - GGUF
+     - ✅︎
+     - ✅︎
+     - ✅︎
+     - ✅︎
+     - ✅︎
+     - ✗
+     - ✗
+     - ✗
+     - ✗
+     - ✗
+   * - SqueezeLLM
+     - ✅︎
+     - ✅︎
+     - ✅︎
+     - ✅︎
+     - ✅︎
+     - ✗
+     - ✗
+     - ✗
+     - ✗
+     - ✗

 Notes:
 ^^^^^^

 - Volta refers to SM 7.0, Turing to SM 7.5, Ampere to SM 8.0/8.6, Ada to SM 8.9, and Hopper to SM 9.0.
- "✅" indicates that the quantization method is supported on the specified hardware.
- "❌" indicates that the quantization method is not supported on the specified hardware.
+- "✅︎" indicates that the quantization method is supported on the specified hardware.
+- "✗" indicates that the quantization method is not supported on the specified hardware.

 Please note that this compatibility chart may be subject to change as vLLM continues to evolve and expand its support for different hardware platforms and quantization methods.