[Misc] Use collapsible blocks for benchmark examples. (#20017)

Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
This commit is contained in:
Reid 2025-06-26 18:35:16 +08:00 committed by GitHub
parent 0567c8249f
commit 167aca45cb
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -4,7 +4,7 @@ This README guides you through running benchmark tests with the extensive
datasets supported on vLLM. Its a living document, updated as new features and datasets datasets supported on vLLM. Its a living document, updated as new features and datasets
become available. become available.
## Dataset Overview **Dataset Overview**
<table style="width:100%; border-collapse: collapse;"> <table style="width:100%; border-collapse: collapse;">
<thead> <thead>
@ -82,7 +82,10 @@ become available.
**Note**: HuggingFace dataset's `dataset-name` should be set to `hf` **Note**: HuggingFace dataset's `dataset-name` should be set to `hf`
--- ---
## Example - Online Benchmark <details>
<summary><b>🚀 Example - Online Benchmark</b></summary>
<br/>
First start serving your model First start serving your model
@ -130,7 +133,8 @@ P99 ITL (ms): 8.39
================================================== ==================================================
``` ```
### Custom Dataset **Custom Dataset**
If the dataset you want to benchmark is not supported yet in vLLM, even then you can benchmark on it using `CustomDataset`. Your data needs to be in `.jsonl` format and needs to have "prompt" field per entry, e.g., data.jsonl If the dataset you want to benchmark is not supported yet in vLLM, even then you can benchmark on it using `CustomDataset`. Your data needs to be in `.jsonl` format and needs to have "prompt" field per entry, e.g., data.jsonl
``` ```
@ -162,7 +166,7 @@ python3 benchmarks/benchmark_serving.py --port 9001 --save-result --save-detaile
You can skip applying chat template if your data already has it by using `--custom-skip-chat-template`. You can skip applying chat template if your data already has it by using `--custom-skip-chat-template`.
### VisionArena Benchmark for Vision Language Models **VisionArena Benchmark for Vision Language Models**
```bash ```bash
# need a model with vision capability here # need a model with vision capability here
@ -180,7 +184,7 @@ python3 vllm/benchmarks/benchmark_serving.py \
--num-prompts 1000 --num-prompts 1000
``` ```
### InstructCoder Benchmark with Speculative Decoding **InstructCoder Benchmark with Speculative Decoding**
``` bash ``` bash
VLLM_USE_V1=1 vllm serve meta-llama/Meta-Llama-3-8B-Instruct \ VLLM_USE_V1=1 vllm serve meta-llama/Meta-Llama-3-8B-Instruct \
@ -197,7 +201,7 @@ python3 benchmarks/benchmark_serving.py \
--num-prompts 2048 --num-prompts 2048
``` ```
### Other HuggingFaceDataset Examples **Other HuggingFaceDataset Examples**
```bash ```bash
vllm serve Qwen/Qwen2-VL-7B-Instruct --disable-log-requests vllm serve Qwen/Qwen2-VL-7B-Instruct --disable-log-requests
@ -251,7 +255,7 @@ python3 vllm/benchmarks/benchmark_serving.py \
--num-prompts 80 --num-prompts 80
``` ```
### Running With Sampling Parameters **Running With Sampling Parameters**
When using OpenAI-compatible backends such as `vllm`, optional sampling When using OpenAI-compatible backends such as `vllm`, optional sampling
parameters can be specified. Example client command: parameters can be specified. Example client command:
@ -269,7 +273,7 @@ python3 vllm/benchmarks/benchmark_serving.py \
--num-prompts 10 --num-prompts 10
``` ```
### Running With Ramp-Up Request Rate **Running With Ramp-Up Request Rate**
The benchmark tool also supports ramping up the request rate over the The benchmark tool also supports ramping up the request rate over the
duration of the benchmark run. This can be useful for stress testing the duration of the benchmark run. This can be useful for stress testing the
@ -284,8 +288,12 @@ The following arguments can be used to control the ramp-up:
- `--ramp-up-start-rps`: The request rate at the beginning of the benchmark. - `--ramp-up-start-rps`: The request rate at the beginning of the benchmark.
- `--ramp-up-end-rps`: The request rate at the end of the benchmark. - `--ramp-up-end-rps`: The request rate at the end of the benchmark.
--- </details>
## Example - Offline Throughput Benchmark
<details>
<summary><b>📈 Example - Offline Throughput Benchmark</b></summary>
<br/>
```bash ```bash
python3 vllm/benchmarks/benchmark_throughput.py \ python3 vllm/benchmarks/benchmark_throughput.py \
@ -303,7 +311,7 @@ Total num prompt tokens: 5014
Total num output tokens: 1500 Total num output tokens: 1500
``` ```
### VisionArena Benchmark for Vision Language Models **VisionArena Benchmark for Vision Language Models**
``` bash ``` bash
python3 vllm/benchmarks/benchmark_throughput.py \ python3 vllm/benchmarks/benchmark_throughput.py \
@ -323,7 +331,7 @@ Total num prompt tokens: 14527
Total num output tokens: 1280 Total num output tokens: 1280
``` ```
### InstructCoder Benchmark with Speculative Decoding **InstructCoder Benchmark with Speculative Decoding**
``` bash ``` bash
VLLM_WORKER_MULTIPROC_METHOD=spawn \ VLLM_WORKER_MULTIPROC_METHOD=spawn \
@ -347,7 +355,7 @@ Total num prompt tokens: 261136
Total num output tokens: 204800 Total num output tokens: 204800
``` ```
### Other HuggingFaceDataset Examples **Other HuggingFaceDataset Examples**
**`lmms-lab/LLaVA-OneVision-Data`** **`lmms-lab/LLaVA-OneVision-Data`**
@ -386,7 +394,7 @@ python3 benchmarks/benchmark_throughput.py \
--num-prompts 10 --num-prompts 10
``` ```
### Benchmark with LoRA Adapters **Benchmark with LoRA Adapters**
``` bash ``` bash
# download dataset # download dataset
@ -403,18 +411,22 @@ python3 vllm/benchmarks/benchmark_throughput.py \
--lora-path yard1/llama-2-7b-sql-lora-test --lora-path yard1/llama-2-7b-sql-lora-test
``` ```
--- </details>
## Example - Structured Output Benchmark
<details>
<summary><b>🛠️ Example - Structured Output Benchmark</b></summary>
<br/>
Benchmark the performance of structured output generation (JSON, grammar, regex). Benchmark the performance of structured output generation (JSON, grammar, regex).
### Server Setup **Server Setup**
```bash ```bash
vllm serve NousResearch/Hermes-3-Llama-3.1-8B --disable-log-requests vllm serve NousResearch/Hermes-3-Llama-3.1-8B --disable-log-requests
``` ```
### JSON Schema Benchmark **JSON Schema Benchmark**
```bash ```bash
python3 benchmarks/benchmark_serving_structured_output.py \ python3 benchmarks/benchmark_serving_structured_output.py \
@ -426,7 +438,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
--num-prompts 1000 --num-prompts 1000
``` ```
### Grammar-based Generation Benchmark **Grammar-based Generation Benchmark**
```bash ```bash
python3 benchmarks/benchmark_serving_structured_output.py \ python3 benchmarks/benchmark_serving_structured_output.py \
@ -438,7 +450,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
--num-prompts 1000 --num-prompts 1000
``` ```
### Regex-based Generation Benchmark **Regex-based Generation Benchmark**
```bash ```bash
python3 benchmarks/benchmark_serving_structured_output.py \ python3 benchmarks/benchmark_serving_structured_output.py \
@ -449,7 +461,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
--num-prompts 1000 --num-prompts 1000
``` ```
### Choice-based Generation Benchmark **Choice-based Generation Benchmark**
```bash ```bash
python3 benchmarks/benchmark_serving_structured_output.py \ python3 benchmarks/benchmark_serving_structured_output.py \
@ -460,7 +472,7 @@ python3 benchmarks/benchmark_serving_structured_output.py \
--num-prompts 1000 --num-prompts 1000
``` ```
### XGrammar Benchmark Dataset **XGrammar Benchmark Dataset**
```bash ```bash
python3 benchmarks/benchmark_serving_structured_output.py \ python3 benchmarks/benchmark_serving_structured_output.py \
@ -471,12 +483,16 @@ python3 benchmarks/benchmark_serving_structured_output.py \
--num-prompts 1000 --num-prompts 1000
``` ```
--- </details>
## Example - Long Document QA Throughput Benchmark
<details>
<summary><b>📚 Example - Long Document QA Benchmark</b></summary>
<br/>
Benchmark the performance of long document question-answering with prefix caching. Benchmark the performance of long document question-answering with prefix caching.
### Basic Long Document QA Test **Basic Long Document QA Test**
```bash ```bash
python3 benchmarks/benchmark_long_document_qa_throughput.py \ python3 benchmarks/benchmark_long_document_qa_throughput.py \
@ -488,7 +504,7 @@ python3 benchmarks/benchmark_long_document_qa_throughput.py \
--repeat-count 5 --repeat-count 5
``` ```
### Different Repeat Modes **Different Repeat Modes**
```bash ```bash
# Random mode (default) - shuffle prompts randomly # Random mode (default) - shuffle prompts randomly
@ -519,12 +535,16 @@ python3 benchmarks/benchmark_long_document_qa_throughput.py \
--repeat-mode interleave --repeat-mode interleave
``` ```
--- </details>
## Example - Prefix Caching Benchmark
<details>
<summary><b>🗂️ Example - Prefix Caching Benchmark</b></summary>
<br/>
Benchmark the efficiency of automatic prefix caching. Benchmark the efficiency of automatic prefix caching.
### Fixed Prompt with Prefix Caching **Fixed Prompt with Prefix Caching**
```bash ```bash
python3 benchmarks/benchmark_prefix_caching.py \ python3 benchmarks/benchmark_prefix_caching.py \
@ -535,7 +555,7 @@ python3 benchmarks/benchmark_prefix_caching.py \
--input-length-range 128:256 --input-length-range 128:256
``` ```
### ShareGPT Dataset with Prefix Caching **ShareGPT Dataset with Prefix Caching**
```bash ```bash
# download dataset # download dataset
@ -550,12 +570,16 @@ python3 benchmarks/benchmark_prefix_caching.py \
--input-length-range 128:256 --input-length-range 128:256
``` ```
--- </details>
## Example - Request Prioritization Benchmark
<details>
<summary><b>⚡ Example - Request Prioritization Benchmark</b></summary>
<br/>
Benchmark the performance of request prioritization in vLLM. Benchmark the performance of request prioritization in vLLM.
### Basic Prioritization Test **Basic Prioritization Test**
```bash ```bash
python3 benchmarks/benchmark_prioritization.py \ python3 benchmarks/benchmark_prioritization.py \
@ -566,7 +590,7 @@ python3 benchmarks/benchmark_prioritization.py \
--scheduling-policy priority --scheduling-policy priority
``` ```
### Multiple Sequences per Prompt **Multiple Sequences per Prompt**
```bash ```bash
python3 benchmarks/benchmark_prioritization.py \ python3 benchmarks/benchmark_prioritization.py \
@ -577,3 +601,5 @@ python3 benchmarks/benchmark_prioritization.py \
--scheduling-policy priority \ --scheduling-policy priority \
--n 2 --n 2
``` ```
</details>