From 964472b9667508b1d4a7ed92068ff81740ae0036 Mon Sep 17 00:00:00 2001 From: Chen Zhang Date: Wed, 14 May 2025 23:23:30 +0800 Subject: [PATCH] [Doc] Update prefix cache metrics to counting tokens (#18138) Signed-off-by: Chen Zhang --- docs/source/design/v1/metrics.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/design/v1/metrics.md b/docs/source/design/v1/metrics.md index 7e7c8b925e21d..de80226553728 100644 --- a/docs/source/design/v1/metrics.md +++ b/docs/source/design/v1/metrics.md @@ -415,8 +415,8 @@ The discussion in about adding prefix cache metrics yielded some interesting points which may be relevant to how we approach future metrics. -Every time the prefix cache is queried, we record the number of blocks -queried and the number of queried blocks present in the cache +Every time the prefix cache is queried, we record the number of tokens +queried and the number of queried tokens present in the cache (i.e. hits). However, the metric of interest is the hit rate - i.e. the number of