vllm/cutlass_w8a8 at 9798b2fb0052092a6420172e41c0c8a307eedfa6 - vllm - 丝路新云-代码仓

xinyun/vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-21 15:07:40 +08:00

History

Lucas Wilkinson 9798b2fb00

[Kernel] Update cutlass_scaled_mm to support 2d group (blockwise) scaling (#11868 )

2025-01-30 18:33:00 -08:00

..

[Kernel] Update cutlass_scaled_mm to support 2d group (blockwise) scaling (#11868 )

2025-01-30 18:33:00 -08:00

Epilogues.md

[Kernel] Add per-tensor and per-token AZP epilogues (#5941 )

2024-08-06 18:17:08 +00:00

scaled_mm_c2x_sm75_dispatch.cuh

[Kernel] Tuned int8 Cutlass Kernels for SM75 (T4) (#6996 )

2024-07-31 14:40:32 -07:00

scaled_mm_c2x_sm80_dispatch.cuh

[Kernel] Tuned FP8 Kernels for Ada Lovelace (#6677 )

2024-07-29 09:42:35 -06:00

scaled_mm_c2x_sm89_fp8_dispatch.cuh

[Kernel] Tuned int8 kernels for Ada Lovelace (#6848 )

2024-07-29 20:24:58 -06:00

scaled_mm_c2x_sm89_int8_dispatch.cuh

[Kernel] Tuned int8 kernels for Ada Lovelace (#6848 )

2024-07-29 20:24:58 -06:00

scaled_mm_c2x.cu

[MISC] Replace c10::optional with std::optional (#11730 )

2025-01-05 10:20:34 +09:00

scaled_mm_c2x.cuh

[Kernel] Refactor Cutlass c3x (#10049 )

2024-12-19 07:00:18 +00:00

scaled_mm_c3x.cu

[Kernel] Update cutlass_scaled_mm to support 2d group (blockwise) scaling (#11868 )

2025-01-30 18:33:00 -08:00

scaled_mm_entry.cu

[Kernel] Update cutlass_scaled_mm to support 2d group (blockwise) scaling (#11868 )

2025-01-30 18:33:00 -08:00