vllm/kernels at cb0a7b4bea26657da989562a10055b7d0b59fd3a - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-06-22 05:17:13 +08:00

History

Update rope_scaling to rope_parameters in preparation for Transformers v5 (#28542 )

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

2025-11-19 09:06:36 -08:00

deepgemm

Remove all cases of fmt: on/off (#26253 )

2025-10-05 09:18:14 -07:00

bench_block_fp8_gemm.py

Remove weight_scale.T special case for SM90 Block FP8 CUTLASS kernel (#28431 )

2025-11-11 11:46:04 -05:00

bench_fp8_gemm.py

[Misc] Add SPDX-FileCopyrightText (#20428 )

2025-07-04 07:40:42 +00:00

bench_int8_gemm.py

[Benchmark] Refactor benchmark script for fp8 & int8 (#19627 )

2025-06-15 15:15:37 +08:00

bench_mxfp4_qutlass.py

[Transform] [Quantization] Add QuTLASS support to vLLM (#24440 )

2025-10-10 09:43:40 -07:00

bench_nvfp4_gemm.py

Enable Fbgemm NVFP4 on Dense models (#25609 )

2025-09-24 21:12:53 -07:00

bench_nvfp4_qutlass.py

[Transform] [Quantization] Add QuTLASS support to vLLM (#24440 )

2025-10-10 09:43:40 -07:00

bench_per_token_quant_fp8.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_activation.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_bitblas.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_cutlass_fp4_moe.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_cutlass_moe_fp8.py

Replace torch.cuda.Event with torch.Event for better hardware compatibility (#26985 )

2025-11-18 11:34:36 -08:00

benchmark_device_communicators.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_fused_collective.py

[PERF] Allreduce fusion. Support torch native matching. Tuning of the thresholds (#24248 )

2025-11-10 18:33:11 -05:00

benchmark_grouped_gemm_cutlass.py

Disable nm-testing models with issues in CI (#28206 )

2025-11-06 06:19:07 -08:00

benchmark_layernorm.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_lora.py

Load tuned fused_moe_lora shrink and expand kernel configs separately (#27435 )

2025-11-04 18:27:35 +08:00

benchmark_machete.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_marlin.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_moe_align_block_size.py

[Refactor] Remove moe_align_block_size_triton (#21335 )

2025-07-26 07:08:29 -07:00

benchmark_moe_permute_unpermute.py

Replace torch.cuda.Event with torch.Event for better hardware compatibility (#26985 )

2025-11-18 11:34:36 -08:00

benchmark_moe.py

Replace torch.cuda.Event with torch.Event for better hardware compatibility (#26985 )

2025-11-18 11:34:36 -08:00

benchmark_mrope.py

Update rope_scaling to rope_parameters in preparation for Transformers v5 (#28542 )

2025-11-19 09:06:36 -08:00

benchmark_paged_attention.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_per_token_group_quant.py

Replace torch.cuda.Event with torch.Event for better hardware compatibility (#26985 )

2025-11-18 11:34:36 -08:00

benchmark_quant.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_reshape_and_cache_flash.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_reshape_and_cache.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_rmsnorm.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

benchmark_rope.py

Fix rotary embedding benchmark script (#28323 )

2025-11-10 21:57:12 -05:00

benchmark_shapes.py

Disable nm-testing models with issues in CI (#28206 )

2025-11-06 06:19:07 -08:00

benchmark_silu_mul_fp8_quant.py

Replace torch.cuda.Event with torch.Event for better hardware compatibility (#26985 )

2025-11-18 11:34:36 -08:00

benchmark_trtllm_decode_attention.py

Replace torch.cuda.Event with torch.Event for better hardware compatibility (#26985 )

2025-11-18 11:34:36 -08:00

benchmark_trtllm_prefill_attention.py

Replace torch.cuda.Event with torch.Event for better hardware compatibility (#26985 )

2025-11-18 11:34:36 -08:00

benchmark_w8a8_block_fp8.py

Replace torch.cuda.Event with torch.Event for better hardware compatibility (#26985 )

2025-11-18 11:34:36 -08:00

graph_machete_bench.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

requirements.txt

[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701 )

2024-09-23 13:46:26 -04:00

utils.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

weight_shapes.py

[kernel] Support W4A8 on Hopper (#23198 )

2025-08-24 06:18:04 +00:00