vllm/quantization at 092bb73b8a36ccdb6d6bbac897ba3aa79f660e36 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-08-01 08:21:17 +08:00

History

Jinzhen Lin 1656ad3704

[Kernel][Quantization] add w4a8 support for marlin kernel (#24722 )

Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>

2025-11-29 07:19:33 -08:00

..

nvfp4_utils.py

Bump Flashinfer to v0.4.0 (#26326 )

2025-10-08 23:58:44 -07:00

test_allspark_gemm.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_awq_triton.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_awq.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_block_fp8.py

[Performance] Reduce DeepGEMM N dim restriction from 128 to 64 multiplier (#28687 )

2025-11-19 15:47:13 -08:00

test_block_int8.py

[Misc] Make SchedulerConfig.max_model_len init-only (#28733 )

2025-11-15 01:59:31 -08:00

test_cutlass_2of4_sparse.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_cutlass_scaled_mm.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

test_cutlass_w4a8.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

test_flashinfer_nvfp4_scaled_mm.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_flashinfer_scaled_mm.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_fp8_quant_group.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_fp8_quant.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_ggml.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_gguf.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_gptq.py

[Kernel] Add GPTQv2 format support for low-bit or asymmetric quantization, by adapting gptq_gemm (#26092 )

2025-10-23 23:26:13 -04:00

test_hadacore.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_int8_kernel.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_int8_quant.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_machete_mm.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

test_marlin_gemm.py

[Kernel][Quantization] add w4a8 support for marlin kernel (#24722 )

2025-11-29 07:19:33 -08:00

test_mxfp4_qutlass.py

[Transform] [Quantization] Add QuTLASS support to vLLM (#24440 )

2025-10-10 09:43:40 -07:00

test_nvfp4_quant.py

[Bugfix] Use latency MOE backend as default for Flashinfer and other misc fixes (#27439 )

2025-11-07 04:18:39 -08:00

test_nvfp4_qutlass.py

[Transform] [Quantization] Add QuTLASS support to vLLM (#24440 )

2025-10-10 09:43:40 -07:00

test_nvfp4_scaled_mm.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_per_token_group_quant.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_rocm_skinny_gemms.py

[platform] Move get_cu_count to utils (#27005 )

2025-11-13 08:48:47 +08:00

test_silu_mul_nvfp4_quant.py

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

test_triton_scaled_mm.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00