mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-03-19 01:57:09 +08:00

History

[Feature][Quantization] MXFP4 support for MOE models (#17888 )

Signed-off-by: Felix Marty <felmarty@amd.com>
Signed-off-by: Bowen Bao <bowenbao@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Co-authored-by: Bowen Bao <bowenbao@amd.com>

2025-07-09 13:19:02 -07:00

auto_awq.md

Stop using title frontmatter and fix doc that can only be reached by search (#20623 )

2025-07-08 03:27:40 -07:00

bitblas.md

Stop using title frontmatter and fix doc that can only be reached by search (#20623 )

2025-07-08 03:27:40 -07:00

bnb.md

Stop using title frontmatter and fix doc that can only be reached by search (#20623 )

2025-07-08 03:27:40 -07:00

fp8.md

Stop using title frontmatter and fix doc that can only be reached by search (#20623 )

2025-07-08 03:27:40 -07:00

gguf.md

Stop using title frontmatter and fix doc that can only be reached by search (#20623 )

2025-07-08 03:27:40 -07:00

gptqmodel.md

Stop using title frontmatter and fix doc that can only be reached by search (#20623 )

2025-07-08 03:27:40 -07:00

int4.md

Stop using title frontmatter and fix doc that can only be reached by search (#20623 )

2025-07-08 03:27:40 -07:00

int8.md

Stop using title frontmatter and fix doc that can only be reached by search (#20623 )

2025-07-08 03:27:40 -07:00

modelopt.md

Make distinct code and console admonitions so readers are less likely to miss them (#20585 )

2025-07-07 19:55:28 -07:00

quantized_kvcache.md

Stop using title frontmatter and fix doc that can only be reached by search (#20623 )

2025-07-08 03:27:40 -07:00

quark.md

[Feature][Quantization] MXFP4 support for MOE models (#17888 )

2025-07-09 13:19:02 -07:00

README.md

Stop using title frontmatter and fix doc that can only be reached by search (#20623 )

2025-07-08 03:27:40 -07:00

supported_hardware.md

Stop using title frontmatter and fix doc that can only be reached by search (#20623 )

2025-07-08 03:27:40 -07:00

torchao.md

Make distinct code and console admonitions so readers are less likely to miss them (#20585 )

2025-07-07 19:55:28 -07:00

README.md

Quantization

Quantization trades off model precision for smaller memory footprint, allowing large models to be run on a wider range of devices.

Contents: