vllm/quantization at 010e0e39ea49508a94ad42062505d7629e19b8d2 - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-12 15:27:24 +08:00

History

[Kernel] Add support for block FP8 on SM120 (NVIDIA 5090 and RTX PRO 6000) (#22131 )

Signed-off-by: Junhao Li <junhao@ubicloud.com>

2025-08-07 19:18:28 -07:00

aqlm

[Kernel] fix types used in aqlm and ggml kernels to support dynamo (#7596 )

2024-08-16 14:00:11 -07:00

awq

[Kernel] Fix awq error when n is not divisable by 128 (#13227 )

2025-02-13 20:07:05 -08:00

compressed_tensors

[AMD][CI/Build][Bugfix] Guarding CUDA specific functions by ifndef ROCM (#21766 )

2025-07-29 03:35:37 +00:00

cutlass_w8a8

[Kernel] Add support for block FP8 on SM120 (NVIDIA 5090 and RTX PRO 6000) (#22131 )

2025-08-07 19:18:28 -07:00

fp4

Support CUTLASS NVFP4 (w4a4) for Blackwell Geforce GPUs (SM120) (#21309 )

2025-08-03 00:54:22 -07:00

fp8

[Feature] Non-contiguous Support for FP8 Quantization (#21961 )

2025-08-05 02:36:43 -07:00

fused_kernels

[Perf] Tune scaled_fp8_quant by increasing vectorization (#18844 )

2025-06-03 13:48:25 -07:00

gguf

[Bugfix][ROCm] Fix for warp_size uses on host (#21205 )

2025-07-24 00:37:19 -07:00

gptq

[MISC] Remove unused variableds in C++ (#19609 )

2025-06-15 20:05:28 -07:00

gptq_allspark

[Easy] Eliminate c10::optional usage in vllm/csrc (#17819 )

2025-05-08 03:05:10 -07:00

gptq_marlin

remove unused variables in marlin_template.h (#20236 )

2025-07-02 00:51:52 +00:00

machete

[Kernel] Improve machete memory bound perf (#21556 )

2025-07-25 06:53:21 -07:00

marlin

pre-commit autoupdate (#17380 )

2025-04-29 06:46:55 -07:00

activation_kernels.cu

[Bugfix][ROCm] Fix for warp_size uses on host (#21205 )

2025-07-24 00:37:19 -07:00

per_token_group_quant_8bit.h

[Perf] Cuda Kernel for Int8 Per Token Group Quant (#21476 )

2025-07-25 17:07:07 -07:00

utils.cuh

[Feature][ROCm]Enable fusion pass for torch.compile on ROCm (#15050 )

2025-03-31 04:42:18 -07:00

vectorization_utils.cuh

[Perf] Optimize Vectorization Utils for Int 8 Quantization Kernels (#20331 )

2025-07-04 15:06:24 +08:00

vectorization.cuh

[Perf] Tune scaled_fp8_quant by increasing vectorization (#18844 )

2025-06-03 13:48:25 -07:00