vllm/quantization at 3e92b2b7acaa61335ecd7bea5eeed50388739194 - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-29 16:07:59 +08:00

History

[MoE Refactor][3/N] Deprecate cutlass block quant fp8 (b200) (#30990 )

Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>

2025-12-19 13:09:54 -08:00

awq

[Kernel] Fix awq error when n is not divisable by 128 (#13227 )

2025-02-13 20:07:05 -08:00

cutlass_w4a8

[Kernel]Support W4A8 Grouped GEMM on Hopper (#29691 )

2025-12-08 19:29:06 -08:00

fp4

SM120 / NVFP4: add device guard and runtime SM dispatch to cutlass_scaled_fp4_mm (#29711 )

2025-12-01 17:24:18 -08:00

fused_kernels

[Performance] Fused blockwise quant RMS norm (#27883 )

2025-12-07 16:38:04 +00:00

gguf

[Bugfix][ROCm] Fix for warp_size uses on host (#21205 )

2025-07-24 00:37:19 -07:00

gptq

[Kernel] Add GPTQv2 format support for low-bit or asymmetric quantization, by adapting gptq_gemm (#26092 )

2025-10-23 23:26:13 -04:00

gptq_allspark

[Kernel][Quantization] add w4a8 support for marlin kernel (#24722 )

2025-11-29 07:19:33 -08:00

gptq_marlin

[Kernel][Quantization][MoE] add marlin kernel support for turing (sm75) (#29901 )

2025-12-16 14:35:28 -08:00

hadamard/hadacore

Use narrow over indexing in hadacore_transform to prep for ABI stable (#28756 )

2025-11-15 01:10:15 -08:00

machete

Fix typos in comments across multiple files (#30345 )

2025-12-09 20:05:28 -08:00

marlin/sparse

[Kernel/Quant] Remove the original marlin format and qqq (#23204 )

2025-08-20 15:13:36 -04:00

w8a8

[MoE Refactor][3/N] Deprecate cutlass block quant fp8 (b200) (#30990 )

2025-12-19 13:09:54 -08:00

activation_kernels.cu

[Performance][B200] silu_mul_quant: pack scales in int32 (#28358 )

2025-11-13 10:16:55 -08:00

utils.cuh

…

vectorization_utils.cuh

Make sure that vectorize_with_alignment produced vectorized global loads (#23182 )

2025-08-21 20:06:54 +00:00

vectorization.cuh

[Perf] Tune scaled_fp8_quant by increasing vectorization (#18844 )

2025-06-03 13:48:25 -07:00