mirror of
https://git.datalinker.icu/vllm-project/vllm.git
synced 2025-12-24 16:46:13 +08:00
[Metrics] Fix KV cache usage percent metric multiproc (#28792)
The `vllm:kv_cache_usage_perc` Gauge metric is missing `multiprocess_mode="mostrecent"` and ends up returning
```
vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="277"} 0.0
vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="275"} 0.0
vllm:kv_cache_usage_perc{engine="0",model_name="Qwen/Qwen3-VL-8B-Instruct",pid="273"} 0.6530455880475035
...
```
The deprecated `vllm:gpu_cache_usage_perc` Gauge metric has `multiprocess_mode="mostrecent"`.
Signed-off-by: Jae-Won Chung <jwnchung@umich.edu>
This commit is contained in:
parent
ab01cd14e5
commit
d4acf518d0
@ -494,6 +494,7 @@ class PrometheusStatLogger(AggregateStatLoggerBase):
|
||||
gauge_kv_cache_usage = self._gauge_cls(
|
||||
name="vllm:kv_cache_usage_perc",
|
||||
documentation="KV-cache usage. 1 means 100 percent usage.",
|
||||
multiprocess_mode="mostrecent",
|
||||
labelnames=labelnames,
|
||||
)
|
||||
self.gauge_kv_cache_usage = make_per_engine(
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user