vllm/quantization at 480598958e28fa1e2ed2f7be2d457fc6f85a1748 - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-12 21:57:23 +08:00

History

[Bugfix] Make compressed-tensors MoEs respect ignored layers (#28878 )

Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>

2025-11-26 21:35:13 -05:00

__init__.py

[CI/Build] Move test_utils.py to tests/utils.py (#4425 )

2024-05-13 23:50:09 +09:00

fp_quant.py

[Transform] [Quantization] Add QuTLASS support to vLLM (#24440 )

2025-10-10 09:43:40 -07:00

reference_mxfp4.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_auto_round.py

[CI] Prune Quantization Tests and skip compilation (#27038 )

2025-10-16 17:26:35 -04:00

test_blackwell_moe.py

[BugFix][Performance] Restore flashinfer autotuning for all scenarios (#27904 )

2025-11-04 15:56:21 +08:00

test_compressed_tensors.py

[Bugfix] Make compressed-tensors MoEs respect ignored layers (#28878 )

2025-11-26 21:35:13 -05:00

test_configs.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_cpu_offload.py

[CI Sprint] Quantization CI Cleanup (#24130 )

2025-11-18 09:21:48 -05:00

test_cpu_wna16.py

[CPU] Refactor CPU WNA16 (#28826 )

2025-11-19 10:32:00 +08:00

test_experts_int8.py

[CI Sprint] Quantization CI Cleanup (#24130 )

2025-11-18 09:21:48 -05:00

test_fp8.py

[CI Sprint] Quantization CI Cleanup (#24130 )

2025-11-18 09:21:48 -05:00

test_gptq_dynamic.py

[CI] Prune Quantization Tests and skip compilation (#27038 )

2025-10-16 17:26:35 -04:00

test_gptq_v2.py

[Kernel] Add GPTQv2 format support for low-bit or asymmetric quantization, by adapting gptq_gemm (#26092 )

2025-10-23 23:26:13 -04:00

test_ipex_quant.py

[CI Sprint] Quantization CI Cleanup (#24130 )

2025-11-18 09:21:48 -05:00

test_lm_head.py

[CI Sprint] Quantization CI Cleanup (#24130 )

2025-11-18 09:21:48 -05:00

test_mixed_precision.py

[ROCm][Quantization] extend AMD Quark to support mixed-precision quantized model (#24239 )

2025-11-11 12:05:22 -05:00

test_modelopt.py

[CI Sprint] Quantization CI Cleanup (#24130 )

2025-11-18 09:21:48 -05:00

test_ptpc_fp8.py

[CI Sprint] Quantization CI Cleanup (#24130 )

2025-11-18 09:21:48 -05:00

test_quark.py

VLLM_USE_TRITON_FLASH_ATTN V0 variable deprecation (#27611 )

2025-11-11 18:34:36 -08:00

test_register_quantization_config.py

[CI Sprint] Quantization CI Cleanup (#24130 )

2025-11-18 09:21:48 -05:00

test_rtn.py

[CI] Prune Quantization Tests and skip compilation (#27038 )

2025-10-16 17:26:35 -04:00

test_torchao.py

[torchao] fix safetensors for sharding (#28169 )

2025-11-19 16:39:45 -08:00

utils.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00