[Benchmark] Update bench doc with mtbench, blazedit, spec bench (#24450)

Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
This commit is contained in:
Ekagra Ranjan 2025-09-09 17:15:41 -04:00 committed by GitHub
parent b5fb3005a8
commit 0dc9cbb527
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -95,6 +95,24 @@ become available.
<td style="text-align: center;"></td>
<td><code>lmms-lab/LLaVA-OneVision-Data</code>, <code>Aeala/ShareGPT_Vicuna_unfiltered</code></td>
</tr>
<tr>
<td><strong>HuggingFace-MTBench</strong></td>
<td style="text-align: center;"></td>
<td style="text-align: center;"></td>
<td><code>philschmid/mt-bench</code></td>
</tr>
<tr>
<td><strong>HuggingFace-Blazedit</strong></td>
<td style="text-align: center;"></td>
<td style="text-align: center;"></td>
<td><code>vdaita/edit_5k_char</code>, <code>vdaita/edit_10k_char</code></td>
</tr>
<tr>
<td><strong>Spec Bench</strong></td>
<td style="text-align: center;"></td>
<td style="text-align: center;"></td>
<td><code>wget https://raw.githubusercontent.com/hemingkx/Spec-Bench/refs/heads/main/data/spec_bench/question.jsonl</code></td>
</tr>
<tr>
<td><strong>Custom</strong></td>
<td style="text-align: center;"></td>
@ -239,6 +257,43 @@ vllm bench serve \
--num-prompts 2048
```
### Spec Bench Benchmark with Speculative Decoding
``` bash
VLLM_USE_V1=1 vllm serve meta-llama/Meta-Llama-3-8B-Instruct \
--speculative-config $'{"method": "ngram",
"num_speculative_tokens": 5, "prompt_lookup_max": 5,
"prompt_lookup_min": 2}'
```
[SpecBench dataset](https://github.com/hemingkx/Spec-Bench)
Run all categories:
``` bash
# Download the dataset using:
# wget https://raw.githubusercontent.com/hemingkx/Spec-Bench/refs/heads/main/data/spec_bench/question.jsonl
vllm bench serve \
--model meta-llama/Meta-Llama-3-8B-Instruct \
--dataset-name spec_bench \
--dataset-path "<YOUR_DOWNLOADED_PATH>/data/spec_bench/question.jsonl" \
--num-prompts -1
```
Available categories include `[writing, roleplay, reasoning, math, coding, extraction, stem, humanities, translation, summarization, qa, math_reasoning, rag]`.
Run only a specific category like "summarization":
``` bash
vllm bench serve \
--model meta-llama/Meta-Llama-3-8B-Instruct \
--dataset-name spec_bench \
--dataset-path "<YOUR_DOWNLOADED_PATH>/data/spec_bench/question.jsonl" \
--num-prompts -1
--spec-bench-category "summarization"
```
### Other HuggingFaceDataset Examples
```bash
@ -295,6 +350,18 @@ vllm bench serve \
--num-prompts 80
```
`vdaita/edit_5k_char` or `vdaita/edit_10k_char`:
``` bash
vllm bench serve \
--model Qwen/QwQ-32B \
--dataset-name hf \
--dataset-path vdaita/edit_5k_char \
--num-prompts 90 \
--blazedit-min-distance 0.01 \
--blazedit-max-distance 0.99
```
### Running With Sampling Parameters
When using OpenAI-compatible backends such as `vllm`, optional sampling