vllm/quantization at 220aee902a291209f2975d4cd02dadcc6749ffe6 - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-08-02 07:02:15 +08:00

History

[NVIDIA] Support Cutlass w8a8 FP8 for Blackwell Geforce GPUs (sm120) (#17280 )

Signed-off-by: kaln27 <liaojuncheng123@foxmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>

2025-07-02 06:47:19 -06:00

aqlm

[Kernel] fix types used in aqlm and ggml kernels to support dynamo (#7596 )

2024-08-16 14:00:11 -07:00

awq

[Kernel] Fix awq error when n is not divisable by 128 (#13227 )

2025-02-13 20:07:05 -08:00

compressed_tensors

[Perf] Vectorize static / dynamic INT8 quant kernels (#19233 )

2025-06-12 06:51:41 -07:00

cutlass_w8a8

[NVIDIA] Support Cutlass w8a8 FP8 for Blackwell Geforce GPUs (sm120) (#17280 )

2025-07-02 06:47:19 -06:00

fp4

[Kernel][Bugfix] Fixup some warnings in nvfp4_blockwise_moe when CUDA < 12.8 (#20324 )

2025-07-01 18:05:47 -07:00

fp8

[MISC] Remove unused variableds in C++ (#19609 )

2025-06-15 20:05:28 -07:00

fused_kernels

[Perf] Tune scaled_fp8_quant by increasing vectorization (#18844 )

2025-06-03 13:48:25 -07:00

gguf

[Kernel] GGUF MMVQ kernel for multiple input vectors (#18754 )

2025-06-16 17:33:26 +08:00

gptq

[MISC] Remove unused variableds in C++ (#19609 )

2025-06-15 20:05:28 -07:00

gptq_allspark

[Easy] Eliminate c10::optional usage in vllm/csrc (#17819 )

2025-05-08 03:05:10 -07:00

gptq_marlin

remove unused variables in marlin_template.h (#20236 )

2025-07-02 00:51:52 +00:00

machete

[CI] change spell checker from codespell to typos (#18711 )

2025-06-11 19:57:10 -07:00

marlin

pre-commit autoupdate (#17380 )

2025-04-29 06:46:55 -07:00

activation_kernels.cu

[AMD][torch.compile] Enable silu+fp8_quant fusion for rocm (#18082 )

2025-05-13 22:13:56 -07:00

utils.cuh

[Feature][ROCm]Enable fusion pass for torch.compile on ROCm (#15050 )

2025-03-31 04:42:18 -07:00

vectorization_utils.cuh

[Perf] Vectorize static / dynamic INT8 quant kernels (#19233 )

2025-06-12 06:51:41 -07:00

vectorization.cuh

[Perf] Tune scaled_fp8_quant by increasing vectorization (#18844 )

2025-06-03 13:48:25 -07:00