[doc] update wrong hf model links (#17184)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
This commit is contained in:
Reid 2025-04-26 00:40:54 +08:00 committed by GitHub
parent 423e9f1cbe
commit df5c879527
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
5 changed files with 6 additions and 7 deletions

View File

@ -6,7 +6,7 @@ To create a new 4-bit quantized model, you can leverage [AutoAWQ](https://github
Quantization reduces the model's precision from BF16/FP16 to INT4 which effectively reduces the total model memory footprint.
The main benefits are lower latency and memory usage.
You can quantize your own models by installing AutoAWQ or picking one of the [6500+ models on Huggingface](https://huggingface.co/models?sort=trending&search=awq).
You can quantize your own models by installing AutoAWQ or picking one of the [6500+ models on Huggingface](https://huggingface.co/models?search=awq).
```console
pip install autoawq

View File

@ -20,8 +20,8 @@ vLLM reads the model's config file and supports pre-quantized checkpoints.
You can find pre-quantized models on:
- [Hugging Face (BitBLAS)](https://huggingface.co/models?other=bitblas)
- [Hugging Face (GPTQ)](https://huggingface.co/models?other=gptq)
- [Hugging Face (BitBLAS)](https://huggingface.co/models?search=bitblas)
- [Hugging Face (GPTQ)](https://huggingface.co/models?search=gptq)
Usually, these repositories have a `quantize_config.json` file that includes a `quantization_config` section.

View File

@ -14,7 +14,7 @@ pip install bitsandbytes>=0.45.3
vLLM reads the model's config file and supports both in-flight quantization and pre-quantized checkpoint.
You can find bitsandbytes quantized models on <https://huggingface.co/models?other=bitsandbytes>.
You can find bitsandbytes quantized models on <https://huggingface.co/models?search=bitsandbytes>.
And usually, these repositories have a config.json file that includes a quantization_config section.
## Read quantized checkpoint

View File

@ -18,7 +18,7 @@ for more details on this and other advanced features.
## Installation
You can quantize your own models by installing [GPTQModel](https://github.com/ModelCloud/GPTQModel) or picking one of the [5000+ models on Huggingface](https://huggingface.co/models?sort=trending&search=gptq).
You can quantize your own models by installing [GPTQModel](https://github.com/ModelCloud/GPTQModel) or picking one of the [5000+ models on Huggingface](https://huggingface.co/models?search=gptq).
```console
pip install -U gptqmodel --no-build-isolation -v

View File

@ -30,5 +30,4 @@ tokenizer.push_to_hub(hub_repo)
quantized_model.push_to_hub(hub_repo, safe_serialization=False)
```
Alternatively, you can use the TorchAO Quantization space for quantizing models with a simple UI.
See: https://huggingface.co/spaces/medmekk/TorchAO_Quantization
Alternatively, you can use the [TorchAO Quantization space](https://huggingface.co/spaces/medmekk/TorchAO_Quantization) for quantizing models with a simple UI.