mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2026-05-30 09:47:07 +08:00
[Doc] Add top anchor and a note to quantization/bitblas.md (#17042)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
This commit is contained in:
parent
6317a5174a
commit
f7912cba3d
@ -1,7 +1,15 @@
|
|||||||
|
(bitblas)=
|
||||||
|
|
||||||
# BitBLAS
|
# BitBLAS
|
||||||
|
|
||||||
vLLM now supports [BitBLAS](https://github.com/microsoft/BitBLAS) for more efficient and flexible model inference. Compared to other quantization frameworks, BitBLAS provides more precision combinations.
|
vLLM now supports [BitBLAS](https://github.com/microsoft/BitBLAS) for more efficient and flexible model inference. Compared to other quantization frameworks, BitBLAS provides more precision combinations.
|
||||||
|
|
||||||
|
:::{note}
|
||||||
|
Ensure your hardware supports the selected `dtype` (`torch.bfloat16` or `torch.float16`).
|
||||||
|
Most recent NVIDIA GPUs support `float16`, while `bfloat16` is more common on newer architectures like Ampere or Hopper.
|
||||||
|
For details see [supported hardware](https://docs.vllm.ai/en/latest/features/quantization/supported_hardware.html).
|
||||||
|
:::
|
||||||
|
|
||||||
Below are the steps to utilize BitBLAS with vLLM.
|
Below are the steps to utilize BitBLAS with vLLM.
|
||||||
|
|
||||||
```console
|
```console
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user