[Misc]Fix benchmarks/README.md for speculative decoding (#18897)

Signed-off-by: rabi <ramishra@redhat.com>
This commit is contained in:
Rabi Mishra 2025-05-30 13:28:04 +05:30 committed by GitHub
parent 4f4a6b844a
commit 6acb7a6285
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194

View File

@ -146,9 +146,9 @@ python3 vllm/benchmarks/benchmark_serving.py \
``` bash ``` bash
VLLM_USE_V1=1 vllm serve meta-llama/Meta-Llama-3-8B-Instruct \ VLLM_USE_V1=1 vllm serve meta-llama/Meta-Llama-3-8B-Instruct \
--ngram_prompt_lookup_min 2 \ --speculative-config $'{"method": "ngram",
--ngram-prompt-lookup-max 5 \ "num_speculative_tokens": 5, "prompt_lookup_max": 5,
--speculative_config '{"model": "[ngram]", "num_speculative_tokens": 5} "prompt_lookup_min": 2}'
``` ```
``` bash ``` bash
@ -273,9 +273,9 @@ python3 vllm/benchmarks/benchmark_throughput.py \
--output-len=100 \ --output-len=100 \
--num-prompts=2048 \ --num-prompts=2048 \
--async-engine \ --async-engine \
--ngram_prompt_lookup_min=2 \ --speculative-config $'{"method": "ngram",
--ngram-prompt-lookup-max=5 \ "num_speculative_tokens": 5, "prompt_lookup_max": 5,
--speculative_config '{"model": "[ngram]", "num_speculative_tokens": 5} "prompt_lookup_min": 2}'
``` ```
``` ```