From f7912cba3d613afa8b96ce2e04dad671205050c6 Mon Sep 17 00:00:00 2001
From: Michael Yao <haifeng.yao@daocloud.io>
Date: Wed, 23 Apr 2025 22:32:16 +0800
Subject: [PATCH] [Doc] Add top anchor and a note to quantization/bitblas.md
 (#17042)

Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
---
 docs/source/features/quantization/bitblas.md | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/docs/source/features/quantization/bitblas.md b/docs/source/features/quantization/bitblas.md
index aff917f90ec2a..2901f760d3e4c 100644
--- a/docs/source/features/quantization/bitblas.md
+++ b/docs/source/features/quantization/bitblas.md
@@ -1,7 +1,15 @@
+(bitblas)=
+
 # BitBLAS
 
 vLLM now supports [BitBLAS](https://github.com/microsoft/BitBLAS) for more efficient and flexible model inference. Compared to other quantization frameworks, BitBLAS provides more precision combinations.
 
+:::{note}
+Ensure your hardware supports the selected `dtype` (`torch.bfloat16` or `torch.float16`).
+Most recent NVIDIA GPUs support `float16`, while `bfloat16` is more common on newer architectures like Ampere or Hopper.
+For details see [supported hardware](https://docs.vllm.ai/en/latest/features/quantization/supported_hardware.html).
+:::
+
 Below are the steps to utilize BitBLAS with vLLM.
 
 ```console