mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-07-19 11:57:13 +08:00

History

Kate Cheng 3d429d63a6 Enable linear deepgemm_swapAB

Signed-off-by: Kate Cheng <yunhsuanc@nvidia.com>

2025-12-24 11:19:39 -08:00

auto_tune

[Benchmarks] auto_tune.sh: Use hostname variable for server requests (#30529 )

2025-12-15 22:00:29 +00:00

cutlass_benchmarks

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

disagg_benchmarks

[BugFix][PD]: make example proxy usable with P2pNcclConnector (#26628 )

2025-11-20 17:38:31 +00:00

fused_kernels

[Performance] Fused blockwise quant RMS norm (#27883 )

2025-12-07 16:38:04 +00:00

kernels

Enable linear deepgemm_swapAB

2025-12-24 11:19:39 -08:00

multi_turn

[Benchmark] multi_turn: Report warmup-inclusive runtime (#28937 )

2025-11-18 16:38:22 +00:00

overheads

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

structured_schemas

benchmarks: simplify test jsonschema (#14567 )

2025-03-11 13:39:30 +00:00

backend_request_func.py

[Chore] Adjust tokenizer import to avoid circular imports (#30601 )

2025-12-13 04:42:39 -08:00

benchmark_batch_invariance.py

[Chore] Update more locations to use attention_config.backend (#31153 )

2025-12-22 19:19:50 -08:00

benchmark_block_pool.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_hash.py

[Core] Add xxHash as a high-performance hash option for accelerating prefix caching (#29163 )

2025-12-03 16:06:57 +00:00

benchmark_latency.py

[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411 )

2025-09-09 10:02:35 +00:00

benchmark_long_document_qa_throughput.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_ngram_proposer.py

[Bugfix] Fix task still being passed in tests/benchmarks (#30476 )

2025-12-11 10:33:55 +00:00

benchmark_prefix_block_hash.py

[Core] Add xxHash as a high-performance hash option for accelerating prefix caching (#29163 )

2025-12-03 16:06:57 +00:00

benchmark_prefix_caching.py

[Chore] Move tokenizer initialization methods (#29793 )

2025-12-02 13:33:37 +08:00

benchmark_prioritization.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_serving_structured_output.py

[Misc] Consistent case for vllm bench serve results (#30403 )

2025-12-10 09:44:02 -08:00

benchmark_serving.py

[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411 )

2025-09-09 10:02:35 +00:00

benchmark_throughput.py

[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411 )

2025-09-09 10:02:35 +00:00

benchmark_utils.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

README.md

[Docs] move benchmarks README to contributing guides (#24820 )

2025-09-16 05:52:57 -07:00

run_structured_output_benchmark.sh

[Benchmarks] Refactor run_structured_output_benchmarks.sh (#17722 )

2025-05-13 01:47:29 -07:00

sonnet.txt

feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark (#3277 )

2024-03-27 13:39:26 -07:00

README.md

Benchmarks

This directory used to contain vLLM's benchmark scripts and utilities for performance testing and evaluation.

Serving benchmarks: Scripts for testing online inference performance (latency, throughput)
Throughput benchmarks: Scripts for testing offline batch inference performance
Specialized benchmarks: Tools for testing specific features like structured output, prefix caching, long document QA, request prioritization, and multi-modal inference
Dataset utilities: Framework for loading and sampling from various benchmark datasets (ShareGPT, HuggingFace datasets, synthetic data, etc.)

Usage

For detailed usage instructions, examples, and dataset information, see the Benchmark CLI documentation.

For full CLI reference see:

README.md

Benchmarks

Contents

Usage