vllm/docs/features/quantization/supported_hardware.md
Harry Mellor b942c094e3
Stop using title frontmatter and fix doc that can only be reached by search (#20623)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-08 03:27:40 -07:00

2.8 KiB
Raw Blame History

Supported Hardware

The table below shows the compatibility of various quantization implementations with different hardware platforms in vLLM:

Implementation Volta Turing Ampere Ada Hopper AMD GPU Intel GPU x86 CPU AWS Neuron Google TPU
AWQ
GPTQ
Marlin (GPTQ/AWQ/FP8)
INT8 (W8A8)
FP8 (W8A8)
BitBLAS (GPTQ)
AQLM
bitsandbytes
DeepSpeedFP
GGUF
  • Volta refers to SM 7.0, Turing to SM 7.5, Ampere to SM 8.0/8.6, Ada to SM 8.9, and Hopper to SM 9.0.
  • indicates that the quantization method is supported on the specified hardware.
  • indicates that the quantization method is not supported on the specified hardware.

!!! note This compatibility chart is subject to change as vLLM continues to evolve and expand its support for different hardware platforms and quantization methods.

For the most up-to-date information on hardware support and quantization methods, please refer to <gh-dir:vllm/model_executor/layers/quantization> or consult with the vLLM development team.