vllm/kernels at 4ffd6e8942710310f8e4f1a84090784cc2f09e70 - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-03-19 23:27:32 +08:00

History

[Misc] rename torch_dtype to dtype (#26695 )

Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>

2025-10-15 12:11:48 +00:00

deepgemm

Remove all cases of fmt: on/off (#26253 )

2025-10-05 09:18:14 -07:00

bench_block_fp8_gemm.py

[Perf] Use upstream CUTLASS for SM90 Block FP8 kernel (#23280 )

2025-09-11 15:43:14 -07:00

bench_fp8_gemm.py

[Misc] Add SPDX-FileCopyrightText (#20428 )

2025-07-04 07:40:42 +00:00

bench_int8_gemm.py

[Benchmark] Refactor benchmark script for fp8 & int8 (#19627 )

2025-06-15 15:15:37 +08:00

bench_mxfp4_qutlass.py

[Transform] [Quantization] Add QuTLASS support to vLLM (#24440 )

2025-10-10 09:43:40 -07:00

bench_nvfp4_gemm.py

Enable Fbgemm NVFP4 on Dense models (#25609 )

2025-09-24 21:12:53 -07:00

bench_nvfp4_qutlass.py

[Transform] [Quantization] Add QuTLASS support to vLLM (#24440 )

2025-10-10 09:43:40 -07:00

bench_per_token_quant_fp8.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

benchmark_activation.py

[Benchmark] add benchmark for custom activation op (#23908 )

2025-09-06 20:12:05 -07:00

benchmark_bitblas.py

[Bugfix] Add proper comparison for package versions (#22314 )

2025-08-06 20:31:03 -07:00

benchmark_cutlass_fp4_moe.py

[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses (#22537 )

2025-09-17 17:43:31 -06:00

benchmark_cutlass_moe_fp8.py

Add CUTLASS FP8 MOE benchmark scripts and kernel config (#25302 )

2025-09-23 18:07:42 -06:00

benchmark_device_communicators.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

benchmark_grouped_gemm_cutlass.py

[Kernel] Delegate construction of FusedMoEQuantConfig to FusedMoEMethodBase subclasses (#22537 )

2025-09-17 17:43:31 -06:00

benchmark_layernorm.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

benchmark_lora.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

benchmark_machete.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

benchmark_marlin.py

[Bugfix][Benchmark] Fix Marlin benchmark (#19929 )

2025-06-24 07:25:12 +09:00

benchmark_moe_align_block_size.py

[Refactor] Remove moe_align_block_size_triton (#21335 )

2025-07-26 07:08:29 -07:00

benchmark_moe_permute_unpermute.py

[Misc] rename torch_dtype to dtype (#26695 )

2025-10-15 12:11:48 +00:00

benchmark_moe.py

[Misc] rename torch_dtype to dtype (#26695 )

2025-10-15 12:11:48 +00:00

benchmark_mrope.py

[FEAT] [Performance] Add triton mrope to replace the torch code path (#22375 )

2025-08-09 11:50:03 -07:00

benchmark_paged_attention.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

benchmark_per_token_group_quant.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

benchmark_polynorm.py

[Model] New model support for Motif-1-Tiny (#23414 )

2025-09-10 23:29:40 -07:00

benchmark_quant.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

benchmark_reshape_and_cache_flash.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

benchmark_reshape_and_cache.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

benchmark_rmsnorm.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

benchmark_rope.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

benchmark_shapes.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

benchmark_silu_mul_fp8_quant.py

Silu v2 (#25074 )

2025-10-10 15:19:53 +00:00

benchmark_trtllm_decode_attention.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

benchmark_trtllm_prefill_attention.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

benchmark_w8a8_block_fp8.py

[Benchmarks] Fix imports in FP8 tuning script (#26407 )

2025-10-08 16:31:59 +00:00

graph_machete_bench.py

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

requirements.txt

[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701 )

2024-09-23 13:46:26 -04:00

utils.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

weight_shapes.py

[kernel] Support W4A8 on Hopper (#23198 )

2025-08-24 06:18:04 +00:00