vllm/kernels at 80679f108ffd94c165ea11adbc3afcc43f24a06e - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-21 18:57:24 +08:00

History

[Model] add optimal triton fused moe configs for NemotronH MoE (#27967 )

Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>

2025-11-04 12:59:43 +00:00

deepgemm

Remove all cases of fmt: on/off (#26253 )

2025-10-05 09:18:14 -07:00

bench_block_fp8_gemm.py

[Perf] Use upstream CUTLASS for SM90 Block FP8 kernel (#23280 )

2025-09-11 15:43:14 -07:00

bench_fp8_gemm.py

[Misc] Add SPDX-FileCopyrightText (#20428 )

2025-07-04 07:40:42 +00:00

bench_int8_gemm.py

[Benchmark] Refactor benchmark script for fp8 & int8 (#19627 )

2025-06-15 15:15:37 +08:00

bench_mxfp4_qutlass.py

[Transform] [Quantization] Add QuTLASS support to vLLM (#24440 )

2025-10-10 09:43:40 -07:00

bench_nvfp4_gemm.py

Enable Fbgemm NVFP4 on Dense models (#25609 )

2025-09-24 21:12:53 -07:00

bench_nvfp4_qutlass.py

[Transform] [Quantization] Add QuTLASS support to vLLM (#24440 )

2025-10-10 09:43:40 -07:00

bench_per_token_quant_fp8.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_activation.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_bitblas.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_cutlass_fp4_moe.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_cutlass_moe_fp8.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_device_communicators.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_grouped_gemm_cutlass.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_layernorm.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_lora.py

Load tuned fused_moe_lora shrink and expand kernel configs separately (#27435 )

2025-11-04 18:27:35 +08:00

benchmark_machete.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_marlin.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_moe_align_block_size.py

[Refactor] Remove moe_align_block_size_triton (#21335 )

2025-07-26 07:08:29 -07:00

benchmark_moe_permute_unpermute.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_moe.py

[Model] add optimal triton fused moe configs for NemotronH MoE (#27967 )

2025-11-04 12:59:43 +00:00

benchmark_mrope.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_paged_attention.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_per_token_group_quant.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

benchmark_quant.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_reshape_and_cache_flash.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_reshape_and_cache.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_rmsnorm.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

benchmark_rope.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_shapes.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

benchmark_silu_mul_fp8_quant.py

Silu v2 (#25074 )

2025-10-10 15:19:53 +00:00

benchmark_trtllm_decode_attention.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_trtllm_prefill_attention.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_w8a8_block_fp8.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

graph_machete_bench.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

requirements.txt

[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701 )

2024-09-23 13:46:26 -04:00

utils.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

weight_shapes.py

[kernel] Support W4A8 on Hopper (#23198 )

2025-08-24 06:18:04 +00:00