vllm/quantization at 0eecb3166365a29db117c2aff6ca441b484b514d - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-08-01 19:29:09 +08:00

History

[Compile] Fix Compile Warning for Ignoring MIN_BLOCK_PER_SM (#25193 )

Signed-off-by: yewentao256 <zhyanwentao@126.com>

2025-09-19 16:23:19 -06:00

awq

[Kernel] Fix awq error when n is not divisable by 128 (#13227 )

2025-02-13 20:07:05 -08:00

compressed_tensors

Apply fixes for CUDA 13 (#24599 )

2025-09-17 09:15:42 -04:00

cutlass_w4a8

[Kernel] Faster pre-processing time for W4A8 (#23972 )

2025-09-17 14:35:32 -07:00

cutlass_w8a8

[Bug] Fix Cutlass Scaled MM Compilation Error (#24887 )

2025-09-15 17:21:17 -04:00

fp4

[Compile] Fix Compile Warning for Ignoring MIN_BLOCK_PER_SM (#25193 )

2025-09-19 16:23:19 -06:00

fp8

Apply fixes for CUDA 13 (#24599 )

2025-09-17 09:15:42 -04:00

fused_kernels

Apply fixes for CUDA 13 (#24599 )

2025-09-17 09:15:42 -04:00

gguf

[Bugfix][ROCm] Fix for warp_size uses on host (#21205 )

2025-07-24 00:37:19 -07:00

gptq

[MISC] Remove unused variableds in C++ (#19609 )

2025-06-15 20:05:28 -07:00

gptq_allspark

[Easy] Eliminate c10::optional usage in vllm/csrc (#17819 )

2025-05-08 03:05:10 -07:00

gptq_marlin

[Kernel] [Quantization] Add MXFP4 and bias support for marlin kernel (#22428 )

2025-08-14 11:23:22 -07:00

hadamard/hadacore

[Transform] Deterministic Hadacore Transforms (#24106 )

2025-09-15 12:59:31 -06:00

machete

[Doc]: fix typos in Python comments (#24294 )

2025-09-05 19:41:12 -07:00

marlin/sparse

[Kernel/Quant] Remove the original marlin format and qqq (#23204 )

2025-08-20 15:13:36 -04:00

activation_kernels.cu

silu-v1: Fix EPS not being used during max-reduction (#25069 )

2025-09-18 10:25:12 +00:00

per_token_group_quant_8bit.h

[Perf] Cuda Kernel for Int8 Per Token Group Quant (#21476 )

2025-07-25 17:07:07 -07:00

utils.cuh

[Feature][ROCm]Enable fusion pass for torch.compile on ROCm (#15050 )

2025-03-31 04:42:18 -07:00

vectorization_utils.cuh

Make sure that vectorize_with_alignment produced vectorized global loads (#23182 )

2025-08-21 20:06:54 +00:00

vectorization.cuh

[Perf] Tune scaled_fp8_quant by increasing vectorization (#18844 )

2025-06-03 13:48:25 -07:00