vllm/quantization at edb59a9470f5c67ef11d52e7bb25fb8ea17f120f - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-31 22:33:29 +08:00

History

Andreas Karatzas 9f0247cfa4

VLLM_USE_TRITON_FLASH_ATTN V0 variable deprecation (#27611 )

Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Andreas Karatzas <Andreas.Karatzas@amd.com>

2025-11-11 18:34:36 -08:00

__init__.py

[CI/Build] Move test_utils.py to tests/utils.py (#4425 )

2024-05-13 23:50:09 +09:00

fp_quant.py

[Transform] [Quantization] Add QuTLASS support to vLLM (#24440 )

2025-10-10 09:43:40 -07:00

reference_mxfp4.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_auto_round.py

[CI] Prune Quantization Tests and skip compilation (#27038 )

2025-10-16 17:26:35 -04:00

test_blackwell_moe.py

[BugFix][Performance] Restore flashinfer autotuning for all scenarios (#27904 )

2025-11-04 15:56:21 +08:00

test_compressed_tensors.py

[1/N][Platform] Cleanup useless function (#26982 )

2025-10-22 09:04:57 +00:00

test_configs.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_cpu_offload.py

[CI] Prune Quantization Tests and skip compilation (#27038 )

2025-10-16 17:26:35 -04:00

test_experts_int8.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_fp8.py

[CI Failure] nm-testing/Qwen2-0.5B-Instruct-FP8-SkipQKV was removed from HF. Skip it in tests (#28170 )

2025-11-06 01:22:13 +00:00

test_gptq_dynamic.py

[CI] Prune Quantization Tests and skip compilation (#27038 )

2025-10-16 17:26:35 -04:00

test_gptq_v2.py

[Kernel] Add GPTQv2 format support for low-bit or asymmetric quantization, by adapting gptq_gemm (#26092 )

2025-10-23 23:26:13 -04:00

test_ipex_quant.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_lm_head.py

[CI] Prune Quantization Tests and skip compilation (#27038 )

2025-10-16 17:26:35 -04:00

test_mixed_precision.py

[ROCm][Quantization] extend AMD Quark to support mixed-precision quantized model (#24239 )

2025-11-11 12:05:22 -05:00

test_modelopt.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_ptpc_fp8.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_quark.py

VLLM_USE_TRITON_FLASH_ATTN V0 variable deprecation (#27611 )

2025-11-11 18:34:36 -08:00

test_register_quantization_config.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

test_rtn.py

[CI] Prune Quantization Tests and skip compilation (#27038 )

2025-10-16 17:26:35 -04:00

test_torchao.py

Support using Int4PreshuffledTensor after loading (#26066 )

2025-11-04 06:00:57 -05:00

utils.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00