vllm/fp4 at 8ee90c83f8e8b53f35ee3df3a86377ea5a587eea - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-21 18:37:17 +08:00

History

[FIX] FP4 quantization kernel padding initialization bug (#31097 )

Signed-off-by: <>
Co-authored-by: root <root@gpu-193.slurm-workers-slurm.slurm.svc.cluster.local>
Co-authored-by: root <root@gpu-951.slurm-workers-slurm.slurm.svc.cluster.local>

2025-12-23 08:45:18 -08:00

activation_nvfp4_quant_fusion_kernels.cu

[NVFP4][Perf] Tune NVFP4 input quant kernel for small batch size (#30897 )

2025-12-21 09:41:57 -08:00

nvfp4_blockwise_moe_kernel.cu

[Kernel] Add NVFP4 MoE CUTLASS support for SM120 (#29242 )

2025-11-25 06:59:07 -08:00

nvfp4_experts_quant.cu

[NVFP4][Perf] Tune NVFP4 input quant kernel for small batch size (#30897 )

2025-12-21 09:41:57 -08:00

nvfp4_quant_entry.cu

[Kernel] Add NVFP4 MoE CUTLASS support for SM120 (#29242 )

2025-11-25 06:59:07 -08:00

nvfp4_quant_kernels.cu

[FIX] FP4 quantization kernel padding initialization bug (#31097 )

2025-12-23 08:45:18 -08:00

nvfp4_scaled_mm_entry.cu

SM120 / NVFP4: add device guard and runtime SM dispatch to cutlass_scaled_fp4_mm (#29711 )

2025-12-01 17:24:18 -08:00

nvfp4_scaled_mm_kernels.cu

…

nvfp4_scaled_mm_sm120_kernels.cu

…

nvfp4_utils.cuh

[NVFP4][Perf] Tune NVFP4 input quant kernel for small batch size (#30897 )

2025-12-21 09:41:57 -08:00