12 Commits

Author SHA1 Message Date
Gregory Shtrasberg
90eeea8f85
[Bugfix][ROCm] Fix for warp_size uses on host (#21205)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-07-24 00:37:19 -07:00
Szymon Ożóg
dec66d253b
[Kernel] GGUF MMVQ kernel for multiple input vectors (#18754)
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
2025-06-16 17:33:26 +08:00
Szymon Ożóg
1a45a61387
[Kernel] GGUF MoeVec kernel (#16780)
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
2025-05-06 23:07:23 -07:00
LukasBluebaum
90969fb39a
[Kernel] Add more dtype support for GGUF dequantization (#15879)
Signed-off-by: lukas.bluebaum <lukas.bluebaum@aleph-alpha.com>
2025-04-02 01:58:48 -07:00
Szymon Ożóg
a608160027
[Kernel] Fix conflicting macro names for gguf kernels (#15456)
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
2025-03-25 13:50:49 +00:00
Lu Fang
d3ccbd6350
Fix CUDA kernel index data type in vllm/csrc/quantization/fused_kernels/layernorm_utils.cuh +10 (#15159)
Signed-off-by: Lu Fang <lufang@fb.com>
Co-authored-by: Richard Barnes <rbarnes@meta.com>
2025-03-21 10:01:11 +08:00
Szymon Ożóg
e22ee1e7a2
[Kernel] GGUF MoE kernel (#14613)
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
2025-03-12 03:33:27 +00:00
Szymon Ożóg
89cdaa83e7
[Kernel] Add more dtype support for GGUF kernels (#14043)
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
2025-03-10 07:30:04 -07:00
kliuae
7c25fe45a6
[AMD] Add support for GGUF quantization on ROCm (#10254) 2024-11-22 21:14:49 -08:00
Isotr0py
fc990f9795
[Bugfix][Kernel] Add IQ1_M quantization implementation to GGUF kernel (#8357) 2024-09-15 16:51:44 -06:00
bnellnm
37fd47e780
[Kernel] fix types used in aqlm and ggml kernels to support dynamo (#7596) 2024-08-16 14:00:11 -07:00
Isotr0py
360bd67cf0
[Core] Support loading GGUF model (#5191)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-08-05 17:54:23 -06:00