mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-09 18:46:25 +08:00
parent
d6e634f3d7
commit
e20233d361
@ -5,20 +5,18 @@ Supported Hardware for Quantization Kernels
|
|||||||
|
|
||||||
The table below shows the compatibility of various quantization implementations with different hardware platforms in vLLM:
|
The table below shows the compatibility of various quantization implementations with different hardware platforms in vLLM:
|
||||||
|
|
||||||
===================== ====== ======= ======= ===== ====== ======= ========= ======= ============== ==========
|
============== ====== ======= ======= ===== ====== ======= ========= ======= ============== ==========
|
||||||
Implementation Volta Turing Ampere Ada Hopper AMD GPU Intel GPU x86 CPU AWS Inferentia Google TPU
|
Implementation Volta Turing Ampere Ada Hopper AMD GPU Intel GPU x86 CPU AWS Inferentia Google TPU
|
||||||
===================== ====== ======= ======= ===== ====== ======= ========= ======= ============== ==========
|
============== ====== ======= ======= ===== ====== ======= ========= ======= ============== ==========
|
||||||
AWQ ❌ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌
|
AQLM ✅ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌
|
||||||
GPTQ ✅ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌
|
AWQ ❌ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌
|
||||||
Marlin (GPTQ/AWQ/FP8) ❌ ❌ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌
|
DeepSpeedFP ✅ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌
|
||||||
INT8 (W8A8) ❌ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌
|
FP8 ❌ ❌ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌
|
||||||
FP8 (W8A8) ❌ ❌ ❌ ✅ ✅ ❌ ❌ ❌ ❌ ❌
|
Marlin ❌ ❌ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌
|
||||||
AQLM ✅ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌
|
GPTQ ✅ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌
|
||||||
bitsandbytes ✅ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌
|
SqueezeLLM ✅ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌
|
||||||
DeepSpeedFP ✅ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌
|
bitsandbytes ✅ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌
|
||||||
GGUF ✅ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌
|
============== ====== ======= ======= ===== ====== ======= ========= ======= ============== ==========
|
||||||
SqueezeLLM ✅ ✅ ✅ ✅ ✅ ❌ ❌ ❌ ❌ ❌
|
|
||||||
===================== ====== ======= ======= ===== ====== ======= ========= ======= ============== ==========
|
|
||||||
|
|
||||||
Notes:
|
Notes:
|
||||||
^^^^^^
|
^^^^^^
|
||||||
@ -29,4 +27,4 @@ Notes:
|
|||||||
|
|
||||||
Please note that this compatibility chart may be subject to change as vLLM continues to evolve and expand its support for different hardware platforms and quantization methods.
|
Please note that this compatibility chart may be subject to change as vLLM continues to evolve and expand its support for different hardware platforms and quantization methods.
|
||||||
|
|
||||||
For the most up-to-date information on hardware support and quantization methods, please check the `quantization directory <https://github.com/vllm-project/vllm/tree/main/vllm/model_executor/layers/quantization>`_ or consult with the vLLM development team.
|
For the most up-to-date information on hardware support and quantization methods, please check the `quantization directory <https://github.com/vllm-project/vllm/tree/main/vllm/model_executor/layers/quantization>`_ or consult with the vLLM development team.
|
||||||
Loading…
x
Reference in New Issue
Block a user