This website requires JavaScript.
Explore
Help
Sign In
xinyun
/
vllm
Watch
1
Star
0
Fork
0
You've already forked vllm
mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced
2026-04-01 08:27:03 +08:00
Code
Issues
Packages
Projects
Releases
Wiki
Activity
vllm
/
benchmarks
/
kernels
History
Caleb_Du
57c22e57f9
Fix CUDA permute/unpermute for use with DeepGemm Moe (
#17934
)
...
Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn>
2025-07-27 07:08:00 -07:00
..
deepgemm
…
bench_fp8_gemm.py
…
bench_int8_gemm.py
…
bench_nvfp4_gemm.py
[Bench] Add NVFP4 GEMM benchmark script (
#20578
)
2025-07-09 13:23:48 -04:00
bench_per_token_quant_fp8.py
[Perf][fp8] Use CustomOp abstraction for fp8 quant for better perf (
#19830
)
2025-07-11 04:56:28 +00:00
benchmark_aqlm.py
…
benchmark_bitblas.py
…
benchmark_cutlass_fp4_moe.py
…
benchmark_grouped_gemm_cutlass.py
Revert "[Performance] Performance improvements in non-blockwise fp8 CUTLASS MoE (
#20762
) (
#21334
)
2025-07-21 21:49:01 -07:00
benchmark_layernorm.py
…
benchmark_lora.py
…
benchmark_machete.py
…
benchmark_marlin.py
…
benchmark_moe_align_block_size.py
[Refactor] Remove
moe_align_block_size_triton
(
#21335
)
2025-07-26 07:08:29 -07:00
benchmark_moe_permute_unpermute.py
Fix CUDA permute/unpermute for use with DeepGemm Moe (
#17934
)
2025-07-27 07:08:00 -07:00
benchmark_moe.py
GLM-4 Update (
#20736
)
2025-07-19 22:40:31 +00:00
benchmark_paged_attention.py
…
benchmark_quant.py
…
benchmark_rmsnorm.py
…
benchmark_rope.py
…
benchmark_shapes.py
…
benchmark_trtllm_attention.py
[Core] Add Flashinfer TRTLLM Backend for Flashinfer decode path (SM100). (
#19825
)
2025-07-11 09:23:23 +00:00
benchmark_w8a8_block_fp8.py
…
graph_machete_bench.py
…
requirements.txt
…
utils.py
…
weight_shapes.py
…