mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-09 03:15:00 +08:00
[Misc] Rename TensorRT Model Optimizer to Model Optimizer (#30091)
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
This commit is contained in:
parent
d143271234
commit
cd00c443d2
@ -14,7 +14,7 @@ Contents:
|
||||
- [INT4 W4A16](int4.md)
|
||||
- [INT8 W8A8](int8.md)
|
||||
- [FP8 W8A8](fp8.md)
|
||||
- [NVIDIA TensorRT Model Optimizer](modelopt.md)
|
||||
- [NVIDIA Model Optimizer](modelopt.md)
|
||||
- [AMD Quark](quark.md)
|
||||
- [Quantized KV Cache](quantized_kvcache.md)
|
||||
- [TorchAO](torchao.md)
|
||||
|
||||
@ -1,6 +1,6 @@
|
||||
# NVIDIA TensorRT Model Optimizer
|
||||
# NVIDIA Model Optimizer
|
||||
|
||||
The [NVIDIA TensorRT Model Optimizer](https://github.com/NVIDIA/TensorRT-Model-Optimizer) is a library designed to optimize models for inference with NVIDIA GPUs. It includes tools for Post-Training Quantization (PTQ) and Quantization Aware Training (QAT) of Large Language Models (LLMs), Vision Language Models (VLMs), and diffusion models.
|
||||
The [NVIDIA Model Optimizer](https://github.com/NVIDIA/Model-Optimizer) is a library designed to optimize models for inference with NVIDIA GPUs. It includes tools for Post-Training Quantization (PTQ) and Quantization Aware Training (QAT) of Large Language Models (LLMs), Vision Language Models (VLMs), and diffusion models.
|
||||
|
||||
We recommend installing the library with:
|
||||
|
||||
@ -10,7 +10,7 @@ pip install nvidia-modelopt
|
||||
|
||||
## Quantizing HuggingFace Models with PTQ
|
||||
|
||||
You can quantize HuggingFace models using the example scripts provided in the TensorRT Model Optimizer repository. The primary script for LLM PTQ is typically found within the `examples/llm_ptq` directory.
|
||||
You can quantize HuggingFace models using the example scripts provided in the Model Optimizer repository. The primary script for LLM PTQ is typically found within the `examples/llm_ptq` directory.
|
||||
|
||||
Below is an example showing how to quantize a model using modelopt's PTQ API:
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user