vllm/gptq_marlin at e515668edf510d86a0543ac5d7981dd91b2026d7 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-30 12:17:22 +08:00

History

Jinzhen Lin 1d0c9d6b2d

[Kernel] some optimizations for dense marlin and moe marlin (#16850 )

Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

2025-05-05 09:39:30 -07:00

..

.gitignore

[Kernel] some optimizations for dense marlin and moe marlin (#16850 )

2025-05-05 09:39:30 -07:00

awq_marlin_repack.cu

Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 (#15160 )

2025-03-25 15:36:45 +08:00

dequant.h

[Kernel] some optimizations for dense marlin and moe marlin (#16850 )

2025-05-05 09:39:30 -07:00

generate_kernels.py

[Kernel] some optimizations for dense marlin and moe marlin (#16850 )

2025-05-05 09:39:30 -07:00

gptq_marlin_repack.cu

Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 (#15160 )

2025-03-25 15:36:45 +08:00

gptq_marlin.cu

[Kernel] some optimizations for dense marlin and moe marlin (#16850 )

2025-05-05 09:39:30 -07:00

kernel.h

[Kernel] some optimizations for dense marlin and moe marlin (#16850 )

2025-05-05 09:39:30 -07:00

marlin_dtypes.cuh

[Kernel] moe wna16 marlin kernel (#14447 )

2025-04-14 20:05:22 -07:00

marlin_template.h

[Kernel] some optimizations for dense marlin and moe marlin (#16850 )

2025-05-05 09:39:30 -07:00

marlin.cuh

[Kernel] moe wna16 marlin kernel (#14447 )

2025-04-14 20:05:22 -07:00