mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-03-16 09:57:09 +08:00

History

Load tuned fused_moe_lora shrink and expand kernel configs separately (#27435 )

Signed-off-by: Yu Gong <yu3.gong@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>

2025-11-04 18:27:35 +08:00

auto_tune

[V0 Deprecation] Remove VLLM_USE_V1 from docs and scripts (#26336 )

2025-10-07 16:46:44 +08:00

cutlass_benchmarks

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

disagg_benchmarks

[CI/Build] Replace vllm.entrypoints.openai.api_server entrypoint with vllm serve command (#25967 )

2025-10-02 10:04:57 -07:00

fused_kernels

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

kernels

Load tuned fused_moe_lora shrink and expand kernel configs separately (#27435 )

2025-11-04 18:27:35 +08:00

multi_turn

feat(benchmarks): support HF model names in multi-turn benchmark (#27850 )

2025-11-01 08:04:52 +00:00

overheads

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

structured_schemas

benchmarks: simplify test jsonschema (#14567 )

2025-03-11 13:39:30 +00:00

backend_request_func.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

benchmark_block_pool.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_latency.py

[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411 )

2025-09-09 10:02:35 +00:00

benchmark_long_document_qa_throughput.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_ngram_proposer.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_prefix_caching.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_prioritization.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_serving_structured_output.py

[Chore]:Extract math and argparse utilities to separate modules (#27188 )

2025-10-26 04:03:32 -07:00

benchmark_serving.py

[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411 )

2025-09-09 10:02:35 +00:00

benchmark_throughput.py

[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411 )

2025-09-09 10:02:35 +00:00

benchmark_utils.py

Update Optional[x] -> x | None and Union[x, y] to x | y (#26633 )

2025-10-12 09:51:31 -07:00

README.md

[Docs] move benchmarks README to contributing guides (#24820 )

2025-09-16 05:52:57 -07:00

run_structured_output_benchmark.sh

[Benchmarks] Refactor run_structured_output_benchmarks.sh (#17722 )

2025-05-13 01:47:29 -07:00

sonnet.txt

…

README.md

Benchmarks

This directory used to contain vLLM's benchmark scripts and utilities for performance testing and evaluation.

Serving benchmarks: Scripts for testing online inference performance (latency, throughput)
Throughput benchmarks: Scripts for testing offline batch inference performance
Specialized benchmarks: Tools for testing specific features like structured output, prefix caching, long document QA, request prioritization, and multi-modal inference
Dataset utilities: Framework for loading and sampling from various benchmark datasets (ShareGPT, HuggingFace datasets, synthetic data, etc.)

Usage

For detailed usage instructions, examples, and dataset information, see the Benchmark CLI documentation.

For full CLI reference see:

README.md

Benchmarks

Contents

Usage