mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-18 02:55:02 +08:00
Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2.0 KiB
2.0 KiB
Production Metrics
vLLM exposes a number of metrics that can be used to monitor the health of the
system. These metrics are exposed via the /metrics endpoint on the vLLM
OpenAI compatible API server.
You can start the server using Python, or using Docker:
vllm serve unsloth/Llama-3.2-1B-Instruct
Then query the endpoint to get the latest metrics from the server:
??? console "Output"
```console
$ curl http://0.0.0.0:8000/metrics
# HELP vllm:iteration_tokens_total Histogram of number of tokens per engine_step.
# TYPE vllm:iteration_tokens_total histogram
vllm:iteration_tokens_total_sum{model_name="unsloth/Llama-3.2-1B-Instruct"} 0.0
vllm:iteration_tokens_total_bucket{le="1.0",model_name="unsloth/Llama-3.2-1B-Instruct"} 3.0
vllm:iteration_tokens_total_bucket{le="8.0",model_name="unsloth/Llama-3.2-1B-Instruct"} 3.0
vllm:iteration_tokens_total_bucket{le="16.0",model_name="unsloth/Llama-3.2-1B-Instruct"} 3.0
vllm:iteration_tokens_total_bucket{le="32.0",model_name="unsloth/Llama-3.2-1B-Instruct"} 3.0
vllm:iteration_tokens_total_bucket{le="64.0",model_name="unsloth/Llama-3.2-1B-Instruct"} 3.0
vllm:iteration_tokens_total_bucket{le="128.0",model_name="unsloth/Llama-3.2-1B-Instruct"} 3.0
vllm:iteration_tokens_total_bucket{le="256.0",model_name="unsloth/Llama-3.2-1B-Instruct"} 3.0
vllm:iteration_tokens_total_bucket{le="512.0",model_name="unsloth/Llama-3.2-1B-Instruct"} 3.0
...
```
The following metrics are exposed:
General Metrics
--8<-- "docs/generated/metrics/general.md"
Speculative Decoding Metrics
--8<-- "docs/generated/metrics/spec_decode.md"
NIXL KV Connector Metrics
--8<-- "docs/generated/metrics/nixl_connector.md"
Deprecation Policy
Note: when metrics are deprecated in version X.Y, they are hidden in version X.Y+1
but can be re-enabled using the --show-hidden-metrics-for-version=X.Y escape hatch,
and are then removed in version X.Y+2.