vllm/kernels at 5e83a7277f7892432375d3d41594ebfde086ca4e - vllm

mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-08-01 05:14:27 +08:00

History

Use Transformers helper get_text_config() instead of checking for text_config (#17105 )

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

2025-04-25 08:47:35 -07:00

deepgemm

Add benchmark for DeepGEMM and vLLM Block FP8 Dense GEMM (#13917 )

2025-03-05 17:08:51 -08:00

benchmark_aqlm.py

[Misc] Add SPDX-License-Identifier headers to python source files (#12628 )

2025-02-02 11:58:18 -08:00

benchmark_bitblas.py

[Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS (#6036 )

2025-04-22 09:01:36 +01:00

benchmark_grouped_gemm_cutlass.py

[Kernel] CUTLASS grouped gemm fp8 MoE kernel (#13972 )

2025-03-27 00:54:44 +00:00

benchmark_layernorm.py

[Bugfix] Correctly call cudaProfilerStop in benchmarks script (#14183 )

2025-03-07 00:42:49 +00:00

benchmark_lora.py

[Bugfix][Misc] Use TritonPlaceholderModule to defensively import triton (#15099 )

2025-04-24 22:51:02 -07:00

benchmark_machete.py

[Bugfix] Correctly call cudaProfilerStop in benchmarks script (#14183 )

2025-03-07 00:42:49 +00:00

benchmark_marlin.py

Update deprecated Python 3.8 typing (#13971 )

2025-03-02 17:34:51 -08:00

benchmark_moe.py

Use Transformers helper get_text_config() instead of checking for text_config (#17105 )

2025-04-25 08:47:35 -07:00

benchmark_paged_attention.py

[Misc] Warn about v0 in benchmark_paged_attn.py (#15495 )

2025-03-25 20:31:04 -07:00

benchmark_quant.py

[Bugfix] Correctly call cudaProfilerStop in benchmarks script (#14183 )

2025-03-07 00:42:49 +00:00

benchmark_rmsnorm.py

Correct capitalisation: VLLM -> vLLM (#14562 )

2025-03-10 16:36:21 +00:00

benchmark_rope.py

Update deprecated Python 3.8 typing (#13971 )

2025-03-02 17:34:51 -08:00

benchmark_shapes.py

[Kernel] CUTLASS grouped gemm fp8 MoE kernel (#13972 )

2025-03-27 00:54:44 +00:00

benchmark_w8a8_block_fp8.py

[Misc] Add tuned R1 w8a8 and MoE configs for NVIDIA L20 (#15322 )

2025-03-23 01:10:10 -07:00

graph_machete_bench.py

Update deprecated Python 3.8 typing (#13971 )

2025-03-02 17:34:51 -08:00

requirements.txt

[Kernel] (2/N) Machete - Integrate into CompressedTensorsWNA16 and GPTQMarlin (#7701 )

2024-09-23 13:46:26 -04:00

utils.py

Update deprecated Python 3.8 typing (#13971 )

2025-03-02 17:34:51 -08:00

weight_shapes.py

[Misc] Add SPDX-License-Identifier headers to python source files (#12628 )

2025-02-02 11:58:18 -08:00