mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-03-19 04:17:07 +08:00

History

[Perf] Fix and reapply move apply w8a8 block fp8 linear to class (#25696 )

Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: ElizaWszola <elizaw.9289@gmail.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>

2025-10-02 19:35:13 +00:00

auto_tune

[Misc] Reduce initialization time of auto_tune (#23682 )

2025-09-23 17:34:58 +00:00

cutlass_benchmarks

[Perf] Fix and reapply move apply w8a8 block fp8 linear to class (#25696 )

2025-10-02 19:35:13 +00:00

disagg_benchmarks

[CI/Build] Replace vllm.entrypoints.openai.api_server entrypoint with vllm serve command (#25967 )

2025-10-02 10:04:57 -07:00

fused_kernels

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

kernels

[Perf] Fix and reapply move apply w8a8 block fp8 linear to class (#25696 )

2025-10-02 19:35:13 +00:00

multi_turn

Add more documentation and improve usability of lognormal dist (benchmark_serving_multi_turn) (#23255 )

2025-09-17 05:53:17 +00:00

overheads

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

structured_schemas

benchmarks: simplify test jsonschema (#14567 )

2025-03-11 13:39:30 +00:00

backend_request_func.py

[Misc] Add request_id into benchmark_serve.py (#23065 )

2025-08-19 08:32:18 +00:00

benchmark_block_pool.py

fix some typos (#24071 )

2025-09-02 20:44:50 -07:00

benchmark_latency.py

[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411 )

2025-09-09 10:02:35 +00:00

benchmark_long_document_qa_throughput.py

[Misc] Modularize CLI Argument Parsing in Benchmark Scripts (#19593 )

2025-06-14 16:54:52 +08:00

benchmark_ngram_proposer.py

[Spec Decode] Add Batch Parallel Ngram. Upto 8x lower overhead. (#24986 )

2025-09-25 15:22:03 -07:00

benchmark_prefix_caching.py

[Misc] Modularize CLI Argument Parsing in Benchmark Scripts (#19593 )

2025-06-14 16:54:52 +08:00

benchmark_prioritization.py

[Misc] Modularize CLI Argument Parsing in Benchmark Scripts (#19593 )

2025-06-14 16:54:52 +08:00

benchmark_serving_structured_output.py

[Benchmark] Fix regression in structured output benchmark (#25500 )

2025-09-24 10:40:42 +00:00

benchmark_serving.py

[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411 )

2025-09-09 10:02:35 +00:00

benchmark_throughput.py

[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411 )

2025-09-09 10:02:35 +00:00

benchmark_utils.py

[Core] [N-gram SD Optimization][1/n] Propose tokens with a single KMP (#22437 )

2025-08-13 14:44:06 -07:00

pyproject.toml

[Doc] Move examples and further reorganize user guide (#18666 )

2025-05-26 07:38:04 -07:00

README.md

[Docs] move benchmarks README to contributing guides (#24820 )

2025-09-16 05:52:57 -07:00

run_structured_output_benchmark.sh

[Benchmarks] Refactor run_structured_output_benchmarks.sh (#17722 )

2025-05-13 01:47:29 -07:00

sonnet.txt

feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark (#3277 )

2024-03-27 13:39:26 -07:00

README.md

Benchmarks

This directory used to contain vLLM's benchmark scripts and utilities for performance testing and evaluation.

Serving benchmarks: Scripts for testing online inference performance (latency, throughput)
Throughput benchmarks: Scripts for testing offline batch inference performance
Specialized benchmarks: Tools for testing specific features like structured output, prefix caching, long document QA, request prioritization, and multi-modal inference
Dataset utilities: Framework for loading and sampling from various benchmark datasets (ShareGPT, HuggingFace datasets, synthetic data, etc.)

Usage

For detailed usage instructions, examples, and dataset information, see the Benchmark CLI documentation.

For full CLI reference see:

README.md

Benchmarks

Contents

Usage