vllm/quantization at 6342adc4389480233d57a6253f27cf65afd36abc - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-13 05:19:07 +08:00

History

[Bugfix][kernels] Fix half2float conversion in gguf kernels (#15995 )

Signed-off-by: Isotr0py <2037008807@qq.com>

2025-04-04 09:38:58 -07:00

aqlm

[Kernel] fix types used in aqlm and ggml kernels to support dynamo (#7596 )

2024-08-16 14:00:11 -07:00

awq

[Kernel] Fix awq error when n is not divisable by 128 (#13227 )

2025-02-13 20:07:05 -08:00

compressed_tensors

[MISC] Replace c10::optional with std::optional (#11730 )

2025-01-05 10:20:34 +09:00

cutlass_w8a8

[Kernel] CUTLASS grouped gemm fp8 MoE kernel (#13972 )

2025-03-27 00:54:44 +00:00

fp4

[Kernel] Add ModelOpt FP4 Checkpoint Support (#12520 )

2025-03-12 05:13:11 +00:00

fp8

[Feature][ROCm]Enable fusion pass for torch.compile on ROCm (#15050 )

2025-03-31 04:42:18 -07:00

fused_kernels

[Feature][ROCm]Enable fusion pass for torch.compile on ROCm (#15050 )

2025-03-31 04:42:18 -07:00

gguf

[Bugfix][kernels] Fix half2float conversion in gguf kernels (#15995 )

2025-04-04 09:38:58 -07:00

gptq

Fix CUDA kernel index data type in vllm/csrc/quantization/fused_kernels/layernorm_utils.cuh +10 (#15159 )

2025-03-21 10:01:11 +08:00

gptq_allspark

Fix CUDA kernel index data type in vllm/csrc/quantization/fused_kernels/layernorm_utils.cuh +10 (#15159 )

2025-03-21 10:01:11 +08:00

gptq_marlin

Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 (#15160 )

2025-03-25 15:36:45 +08:00

machete

add cutlass support for blackwell fp8 gemm (#13798 )

2025-03-04 07:55:07 -08:00

marlin

Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 (#15160 )

2025-03-25 15:36:45 +08:00

utils.cuh

[Feature][ROCm]Enable fusion pass for torch.compile on ROCm (#15050 )

2025-03-31 04:42:18 -07:00

vectorization.cuh

dynamic distpatch of fp8 kernels (#14245 )

2025-03-11 10:54:56 -04:00