vllm/quantization at fd75d3e8c0f522178e39845276fd57908760b4d0 - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-01-10 15:44:33 +08:00

History

[platform] Move get_cu_count to utils (#27005 )

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

2025-11-13 08:48:47 +08:00

nvfp4_utils.py

…

test_allspark_gemm.py

…

test_awq_triton.py

…

test_awq.py

…

test_block_fp8.py

[Chore] Separate out optional dependency checks from vllm.utils (#27207 )

2025-10-22 10:44:21 -04:00

test_block_int8.py

…

test_cutlass_2of4_sparse.py

…

test_cutlass_scaled_mm.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

test_cutlass_w4a8.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

test_flashinfer_nvfp4_scaled_mm.py

…

test_flashinfer_scaled_mm.py

…

test_fp8_quant_group.py

…

test_fp8_quant.py

…

test_ggml.py

…

test_gguf.py

…

test_gptq.py

[Kernel] Add GPTQv2 format support for low-bit or asymmetric quantization, by adapting gptq_gemm (#26092 )

2025-10-23 23:26:13 -04:00

test_hadacore.py

…

test_int8_kernel.py

…

test_int8_quant.py

…

test_machete_mm.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

test_marlin_gemm.py

…

test_mxfp4_qutlass.py

…

test_nvfp4_quant.py

[Bugfix] Use latency MOE backend as default for Flashinfer and other misc fixes (#27439 )

2025-11-07 04:18:39 -08:00

test_nvfp4_qutlass.py

…

test_nvfp4_scaled_mm.py

…

test_per_token_group_quant.py

…

test_rocm_skinny_gemms.py

[platform] Move get_cu_count to utils (#27005 )

2025-11-13 08:48:47 +08:00

test_silu_mul_nvfp4_quant.py

…

test_triton_scaled_mm.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00