vllm/index.md at 00ee37efa23600d7c89d8fd5dc8bdc125c49e39d - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2025-12-13 07:25:01 +08:00

Lei Wang 8d32dc603d

[Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS (#6036 )

Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com>
Co-authored-by: xinyuxiao <xinyuxiao2024@gmail.com>

2025-04-22 09:01:36 +01:00

24 lines

322 B

Markdown

Raw Blame History

 (quantization-index)=
 # Quantization
 Quantization trades off model precision for smaller memory footprint, allowing large models to be run on a wider range of devices.
 :::{toctree}
 :caption: Contents
 :maxdepth: 1
 supported_hardware
 auto_awq
 bnb
 bitblas
 gguf
 gptqmodel
 int4
 int8
 fp8
 quark
 quantized_kvcache
 torchao
 :::