vllm/quantization at 3c7fefdeba183e5c5e575f668b797549530f5a3d - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-22 06:37:10 +08:00

History

Xiangyu Li 5cc6bddb6e

[Kernel] Add GPTQv2 format support for low-bit or asymmetric quantization, by adapting gptq_gemm (#26092 )

2025-10-23 23:26:13 -04:00

..

[Kernel] Fix awq error when n is not divisable by 128 (#13227 )

2025-02-13 20:07:05 -08:00

[Kernel] Faster pre-processing time for W4A8 (#23972 )

2025-09-17 14:35:32 -07:00

[CI/Build] Conditionally register cutlass_fp4_group_mm to fix building on Hopper (#26138 )

2025-10-02 20:32:38 -07:00

[torch.compile] Enable attention and allreduce fusion without custom ops enabled (#24604 )

2025-10-17 08:10:23 -06:00

[Bugfix][ROCm] Fix for warp_size uses on host (#21205 )

2025-07-24 00:37:19 -07:00

[Kernel] Add GPTQv2 format support for low-bit or asymmetric quantization, by adapting gptq_gemm (#26092 )

2025-10-23 23:26:13 -04:00

[Easy] Eliminate c10::optional usage in vllm/csrc (#17819 )

2025-05-08 03:05:10 -07:00

Convert formatting to use ruff instead of yapf + isort (#26247 )

2025-10-05 07:06:22 -07:00

hadamard/hadacore

[Transform] Deterministic Hadacore Transforms (#24106 )

2025-09-15 12:59:31 -06:00

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

[Kernel/Quant] Remove the original marlin format and qqq (#23204 )

2025-08-20 15:13:36 -04:00

[Refactor] Refactor FP8 & INT8 Quant Folder inside w8a8 (#25293 )

2025-10-08 10:20:48 -04:00

activation_kernels.cu

Silu v2 (#25074 )

2025-10-10 15:19:53 +00:00

utils.cuh

[Feature][ROCm]Enable fusion pass for torch.compile on ROCm (#15050 )

2025-03-31 04:42:18 -07:00

vectorization_utils.cuh

Make sure that vectorize_with_alignment produced vectorized global loads (#23182 )

2025-08-21 20:06:54 +00:00

vectorization.cuh

[Perf] Tune scaled_fp8_quant by increasing vectorization (#18844 )

2025-06-03 13:48:25 -07:00