vllm/kernels at 34a20c49b3f81f64133428b3a0d62309db1256f9 - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-05-18 14:07:00 +08:00

History

Fix CUDA permute/unpermute for use with DeepGemm Moe (#17934 )

Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>

2025-07-27 07:08:00 -07:00

deepgemm

Revert "[Feature] Integrate new deepgemm (#19820 )" (#20049 )

2025-06-24 19:45:22 -07:00

bench_fp8_gemm.py

[Misc] Add SPDX-FileCopyrightText (#20428 )

2025-07-04 07:40:42 +00:00

bench_int8_gemm.py

[Benchmark] Refactor benchmark script for fp8 & int8 (#19627 )

2025-06-15 15:15:37 +08:00

bench_nvfp4_gemm.py

[Bench] Add NVFP4 GEMM benchmark script (#20578 )

2025-07-09 13:23:48 -04:00

bench_per_token_quant_fp8.py

[Perf][fp8] Use CustomOp abstraction for fp8 quant for better perf (#19830 )

2025-07-11 04:56:28 +00:00

benchmark_aqlm.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

benchmark_bitblas.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

benchmark_cutlass_fp4_moe.py

[Hardware][NVIDIA] FP4 MoE kernel optimization (#19110 )

2025-06-05 09:48:26 -07:00

benchmark_grouped_gemm_cutlass.py

Revert "[Performance] Performance improvements in non-blockwise fp8 CUTLASS MoE (#20762 ) (#21334 )

2025-07-21 21:49:01 -07:00

benchmark_layernorm.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

benchmark_lora.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

benchmark_machete.py

Enable ZP Support for Machete (#20268 )

2025-07-01 07:12:20 +00:00

benchmark_marlin.py

[Bugfix][Benchmark] Fix Marlin benchmark (#19929 )

2025-06-24 07:25:12 +09:00

benchmark_moe_align_block_size.py

[Refactor] Remove moe_align_block_size_triton (#21335 )

2025-07-26 07:08:29 -07:00

benchmark_moe_permute_unpermute.py

Fix CUDA permute/unpermute for use with DeepGemm Moe (#17934 )

2025-07-27 07:08:00 -07:00

benchmark_moe.py

GLM-4 Update (#20736 )

2025-07-19 22:40:31 +00:00

benchmark_paged_attention.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

benchmark_quant.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

benchmark_rmsnorm.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

benchmark_rope.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

benchmark_shapes.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

benchmark_trtllm_attention.py

[Core] Add Flashinfer TRTLLM Backend for Flashinfer decode path (SM100). (#19825 )

2025-07-11 09:23:23 +00:00

benchmark_w8a8_block_fp8.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

graph_machete_bench.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

requirements.txt

[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701 )

2024-09-23 13:46:26 -04:00

utils.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

weight_shapes.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00