vllm/kernels at c625f9043c5beb5921e94c6a5b3ac18372bab4db - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-21 12:17:18 +08:00

History

[Frontend] Pass API server count to each process (#23717 )

Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>

2025-09-20 01:15:19 +08:00

deepgemm

[Refactor] Remove Duplicate per_block_cast_to_fp8, Remove Dependencies of DeepGEMM (#21787 )

2025-08-01 01:13:27 +00:00

bench_block_fp8_gemm.py

[Perf] Use upstream CUTLASS for SM90 Block FP8 kernel (#23280 )

2025-09-11 15:43:14 -07:00

bench_fp8_gemm.py

[Misc] Add SPDX-FileCopyrightText (#20428 )

2025-07-04 07:40:42 +00:00

bench_int8_gemm.py

[Benchmark] Refactor benchmark script for fp8 & int8 (#19627 )

2025-06-15 15:15:37 +08:00

bench_nvfp4_gemm.py

[Bench] Add NVFP4 GEMM benchmark script (#20578 )

2025-07-09 13:23:48 -04:00

bench_per_token_quant_fp8.py

[FP8] Extend per-token-group quantization support to QuantFP8 (#24342 )

2025-09-16 18:31:06 -07:00

benchmark_activation.py

[Benchmark] add benchmark for custom activation op (#23908 )

2025-09-06 20:12:05 -07:00

benchmark_bitblas.py

[Bugfix] Add proper comparison for package versions (#22314 )

2025-08-06 20:31:03 -07:00

benchmark_cutlass_fp4_moe.py

[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses (#22537 )

2025-09-17 17:43:31 -06:00

benchmark_device_communicators.py

[Kernels] Enable Torch Symmetric Memory All-Reduce By Default (#24111 )

2025-09-11 09:45:31 -07:00

benchmark_grouped_gemm_cutlass.py

[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses (#22537 )

2025-09-17 17:43:31 -06:00

benchmark_layernorm.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

benchmark_lora.py

[fix] lora benchmarks pass no_lora_flag_cpu (#23774 )

2025-09-17 21:22:25 +08:00

benchmark_machete.py

[kernel] Support W4A8 on Hopper (#23198 )

2025-08-24 06:18:04 +00:00

benchmark_marlin.py

[Bugfix][Benchmark] Fix Marlin benchmark (#19929 )

2025-06-24 07:25:12 +09:00

benchmark_moe_align_block_size.py

[Refactor] Remove moe_align_block_size_triton (#21335 )

2025-07-26 07:08:29 -07:00

benchmark_moe_permute_unpermute.py

Fix CUDA permute/unpermute for use with DeepGemm Moe (#17934 )

2025-07-27 07:08:00 -07:00

benchmark_moe.py

[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses (#22537 )

2025-09-17 17:43:31 -06:00

benchmark_mrope.py

[FEAT] [Performance] Add triton mrope to replace the torch code path (#22375 )

2025-08-09 11:50:03 -07:00

benchmark_paged_attention.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

benchmark_per_token_group_quant.py

[Test] Add Benchmark and Unit Test for per_token_group_quant (#21860 )

2025-07-30 07:15:02 -07:00

benchmark_polynorm.py

[Model] New model support for Motif-1-Tiny (#23414 )

2025-09-10 23:29:40 -07:00

benchmark_quant.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

benchmark_reshape_and_cache_flash.py

[Perf] Optimize reshape_and_cache_flash CUDA Kernel (#22036 )

2025-08-01 19:18:51 -04:00

benchmark_rmsnorm.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

benchmark_rope.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

benchmark_shapes.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

benchmark_silu_mul_fp8_quant.py

[Kernels][DP/EP] Optimize Silu Kernel for R1 (#24054 )

2025-09-13 00:17:27 -07:00

benchmark_trtllm_decode_attention.py

[Flashinfer] Support Flashinfer TRTLLM FP8-qkv BF16/FP16-out Attention Kernel (#23647 )

2025-09-08 20:53:07 -07:00

benchmark_trtllm_prefill_attention.py

[Flashinfer] Support Flashinfer TRTLLM FP8-qkv BF16/FP16-out Attention Kernel (#23647 )

2025-09-08 20:53:07 -07:00

benchmark_w8a8_block_fp8.py

[Frontend] Pass API server count to each process (#23717 )

2025-09-20 01:15:19 +08:00

graph_machete_bench.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

requirements.txt

[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701 )

2024-09-23 13:46:26 -04:00

utils.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

weight_shapes.py

[kernel] Support W4A8 on Hopper (#23198 )

2025-08-24 06:18:04 +00:00