mirror of https://git.datalinker.icu/vllm-project/vllm.git synced 2026-03-16 15:47:22 +08:00

History

[FP8] Extend per-token-group quantization support to QuantFP8 (#24342 )

Signed-off-by: Tahsin Tunan <tahsintunan@gmail.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>

2025-09-16 18:31:06 -07:00

auto_tune

Improve flexibility of auto_tune.sh execution. (#23766 )

2025-09-04 09:41:41 +00:00

cutlass_benchmarks

[Refactor] Remove duplicate ceil_div (#20023 )

2025-06-25 05:19:09 +00:00

disagg_benchmarks

Remove deprecated PyNcclConnector (#24151 )

2025-09-03 22:49:16 +00:00

fused_kernels

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

kernels

[FP8] Extend per-token-group quantization support to QuantFP8 (#24342 )

2025-09-16 18:31:06 -07:00

multi_turn

[Doc]: fix typos in Python comments (#24173 )

2025-09-04 08:52:17 -07:00

overheads

[Misc] Add SPDX-FileCopyrightText (#19100 )

2025-06-03 11:20:17 -07:00

structured_schemas

benchmarks: simplify test jsonschema (#14567 )

2025-03-11 13:39:30 +00:00

backend_request_func.py

[Misc] Add request_id into benchmark_serve.py (#23065 )

2025-08-19 08:32:18 +00:00

benchmark_block_pool.py

fix some typos (#24071 )

2025-09-02 20:44:50 -07:00

benchmark_dataset.py

[Doc]: fix typos in Python comments (#24173 )

2025-09-04 08:52:17 -07:00

benchmark_latency.py

[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411 )

2025-09-09 10:02:35 +00:00

benchmark_long_document_qa_throughput.py

[Misc] Modularize CLI Argument Parsing in Benchmark Scripts (#19593 )

2025-06-14 16:54:52 +08:00

benchmark_ngram_proposer.py

fix some typos (#24071 )

2025-09-02 20:44:50 -07:00

benchmark_prefix_caching.py

[Misc] Modularize CLI Argument Parsing in Benchmark Scripts (#19593 )

2025-06-14 16:54:52 +08:00

benchmark_prioritization.py

[Misc] Modularize CLI Argument Parsing in Benchmark Scripts (#19593 )

2025-06-14 16:54:52 +08:00

benchmark_serving_structured_output.py

[Doc]: fix typos in Python comments (#24115 )

2025-09-02 21:14:07 -07:00

benchmark_serving.py

[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411 )

2025-09-09 10:02:35 +00:00

benchmark_throughput.py

[CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411 )

2025-09-09 10:02:35 +00:00

benchmark_utils.py

[Core] [N-gram SD Optimization][1/n] Propose tokens with a single KMP (#22437 )

2025-08-13 14:44:06 -07:00

pyproject.toml

[Doc] Move examples and further reorganize user guide (#18666 )

2025-05-26 07:38:04 -07:00

README.md

[Docs] move benchmarks README to contributing guides (#24820 )

2025-09-16 05:52:57 -07:00

run_structured_output_benchmark.sh

[Benchmarks] Refactor run_structured_output_benchmarks.sh (#17722 )

2025-05-13 01:47:29 -07:00

sonnet.txt

feat(benchmarks): Add Prefix Caching Benchmark to Serving Benchmark (#3277 )

2024-03-27 13:39:26 -07:00

README.md

Benchmarks

This directory used to contain vLLM's benchmark scripts and utilities for performance testing and evaluation.

Serving benchmarks: Scripts for testing online inference performance (latency, throughput)
Throughput benchmarks: Scripts for testing offline batch inference performance
Specialized benchmarks: Tools for testing specific features like structured output, prefix caching, long document QA, request prioritization, and multi-modal inference
Dataset utilities: Framework for loading and sampling from various benchmark datasets (ShareGPT, HuggingFace datasets, synthetic data, etc.)

Usage

For detailed usage instructions, examples, and dataset information, see the Benchmark CLI documentation.

For full CLI reference see:

README.md

Benchmarks

Contents

Usage