vllm/quantization at f07a673eb2fc4eb6f4e18eadb3512702877f5c3a - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-20 02:17:10 +08:00

History

Jinzhen Lin e73b7dfd69

[Bugfix] fix an illegal memory access was encountered of marlin kernel + act_order (#18245 )

2025-05-16 16:02:44 -07:00

..

[Kernel] fix types used in aqlm and ggml kernels to support dynamo (#7596 )

2024-08-16 14:00:11 -07:00

[Kernel] Fix awq error when n is not divisable by 128 (#13227 )

2025-02-13 20:07:05 -08:00

compressed_tensors

[ROCm]: Fix build from source failure with gcc14 and ROCm 6.3 (#13779 )

2025-05-12 20:36:33 -07:00

use ceil_div in cutlass block scaling shape check (#17918 )

2025-05-16 03:02:58 -07:00

[Hardware/NVIDIA/Kernel] Enable nvidia/DeepSeek-R1-FP4 Model (#16362 )

2025-05-09 16:24:41 -07:00

Removed unused marlin cuda code (#17684 )

2025-05-06 17:59:47 -07:00

[ROCm]: Fix build from source failure with gcc14 and ROCm 6.3 (#13779 )

2025-05-12 20:36:33 -07:00

[Kernel] GGUF MoeVec kernel (#16780 )

2025-05-06 23:07:23 -07:00

Fix CUDA kernel index data type in vllm/csrc/quantization/fused_kernels/layernorm_utils.cuh +10 (#15159 )

2025-03-21 10:01:11 +08:00

[Easy] Eliminate c10::optional usage in vllm/csrc (#17819 )

2025-05-08 03:05:10 -07:00

[Bugfix] fix an illegal memory access was encountered of marlin kernel + act_order (#18245 )

2025-05-16 16:02:44 -07:00

add cutlass support for blackwell fp8 gemm (#13798 )

2025-03-04 07:55:07 -08:00

pre-commit autoupdate (#17380 )

2025-04-29 06:46:55 -07:00

activation_kernels.cu

[AMD][torch.compile] Enable silu+fp8_quant fusion for rocm (#18082 )

2025-05-13 22:13:56 -07:00

utils.cuh

[Feature][ROCm]Enable fusion pass for torch.compile on ROCm (#15050 )

2025-03-31 04:42:18 -07:00

vectorization.cuh

dynamic distpatch of fp8 kernels (#14245 )

2025-03-11 10:54:56 -04:00