vllm/quantization at 6d54078047a7b5402f47007cc21152d7c2c7987c - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-07 07:57:08 +08:00

History

[Bugfix] awq_gemm: fix argument order swap (#30364 )

Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>

2025-12-14 18:15:37 +08:00

nvfp4_utils.py

Bump Flashinfer to v0.4.0 (#26326 )

2025-10-08 23:58:44 -07:00

test_allspark_gemm.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_awq_triton.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_awq.py

[Bugfix] awq_gemm: fix argument order swap (#30364 )

2025-12-14 18:15:37 +08:00

test_block_fp8.py

[CI/Build][AMD] Skip quantization kernels tests that require CUTLASS or e4m3fn when not supported by platform (#30020 )

2025-12-10 02:28:37 +00:00

test_block_int8.py

[Misc] Make SchedulerConfig.max_model_len init-only (#28733 )

2025-11-15 01:59:31 -08:00

test_cutlass_2of4_sparse.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_cutlass_scaled_mm.py

[CI/Build][AMD] Skip quantization kernels tests that require CUTLASS or e4m3fn when not supported by platform (#30020 )

2025-12-10 02:28:37 +00:00

test_cutlass_w4a8_moe.py

[CI/Build][AMD] Skip test_cutlass_w4a8_moe tests on ROCm sine they require cutlass_pack_scale_fp8 (#30508 )

2025-12-12 01:02:19 +00:00

test_cutlass_w4a8.py

[CI/Build][AMD] Skip quantization kernels tests that require CUTLASS or e4m3fn when not supported by platform (#30020 )

2025-12-10 02:28:37 +00:00

test_flashinfer_nvfp4_scaled_mm.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_flashinfer_scaled_mm.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_fp8_quant_group.py

[CI/Build][Kernel][BugFix][AMD] Fix per_token_group_quant_fp8 to use correct fp8 min/max values and update atol/rtol in test_quantfp8_group_functionality (#30292 )

2025-12-12 18:41:56 -05:00

test_fp8_quant.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_ggml.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_gguf.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_gptq.py

[Kernel] Add GPTQv2 format support for low-bit or asymmetric quantization, by adapting gptq_gemm (#26092 )

2025-10-23 23:26:13 -04:00

test_hadacore.py

[CI/Build][AMD] Skip marlin, machete, and hadacore tests since these require _C functions not defined for ROCm (#30109 )

2025-12-06 12:54:17 +08:00

test_int8_kernel.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_int8_quant.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_machete_mm.py

[CI/Build][AMD] Skip marlin, machete, and hadacore tests since these require _C functions not defined for ROCm (#30109 )

2025-12-06 12:54:17 +08:00

test_marlin_gemm.py

[CI/Build][AMD] Skip marlin, machete, and hadacore tests since these require _C functions not defined for ROCm (#30109 )

2025-12-06 12:54:17 +08:00

test_mxfp4_qutlass.py

[Transform] [Quantization] Add QuTLASS support to vLLM (#24440 )

2025-10-10 09:43:40 -07:00

test_nvfp4_quant.py

[Bugfix] Use latency MOE backend as default for Flashinfer and other misc fixes (#27439 )

2025-11-07 04:18:39 -08:00

test_nvfp4_qutlass.py

[Transform] [Quantization] Add QuTLASS support to vLLM (#24440 )

2025-10-10 09:43:40 -07:00

test_nvfp4_scaled_mm.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_per_token_group_quant.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_rocm_skinny_gemms.py

[platform] Move get_cu_count to utils (#27005 )

2025-11-13 08:48:47 +08:00

test_scaled_mm_kernel_selection.py

[ROCm] Enable Triton ScaledMM fallback + kernel selection fix (#26668 )

2025-12-12 13:28:20 -05:00

test_silu_mul_nvfp4_quant.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_triton_scaled_mm.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00