feat: Add Grafana and Perces monitoring dashboards for vLLM (#23498)

2026-07-24 14:47:26 +08:00 · 2025-09-16 08:53:40 -04:00 · 2025-09-16 08:53:40 -04:00 · de3e53a75b
commit de3e53a75b
parent 85e0df1392
7 changed files with 3515 additions and 0 deletions
--- a/examples/online_serving/dashboards/README.md
+++ b/examples/online_serving/dashboards/README.md
@ -0,0 +1,87 @@
+# Monitoring Dashboards
+
+This directory contains monitoring dashboard configurations for vLLM, providing
+comprehensive observability for your vLLM deployments.
+
+## Dashboard Platforms
+
+We provide dashboards for two popular observability platforms:
+
+- **[Grafana](https://grafana.com)**
+- **[Perses](https://perses.dev)**
+
+## Dashboard Format Approach
+
+All dashboards are provided in **native formats** that work across different
+deployment methods:
+
+### Grafana (JSON)
+
+- ✅ Works with any Grafana instance (cloud, self-hosted, Docker)
+- ✅ Direct import via Grafana UI or API
+- ✅ Can be wrapped in Kubernetes operators when needed
+- ✅ No vendor lock-in or deployment dependencies
+
+### Perses (YAML)
+
+- ✅ Works with standalone Perses instances
+- ✅ Compatible with Perses API and CLI
+- ✅ Supports Dashboard-as-Code workflows
+- ✅ Can be wrapped in Kubernetes operators when needed
+
+## Dashboard Contents
+
+Both platforms provide equivalent monitoring capabilities:
+
+| Dashboard | Description |
+|-----------|-------------|
+| **Performance Statistics** | Tracks latency, throughput, and performance metrics |
+| **Query Statistics** | Monitors request volume, query performance, and KPIs |
+
+## Quick Start
+
+First, navigate to this example's directory:
+
+```bash
+cd examples/online_serving/dashboards
+```
+
+### Grafana
+
+Import the JSON directly into the Grafana UI, or use the API:
+
+```bash
+curl -X POST http://grafana/api/dashboards/db \
+  -H "Content-Type: application/json" \
+  -d @grafana/performance_statistics.json
+```
+
+### Perses
+
+Import via the Perses CLI:
+
+```bash
+percli apply -f perses/performance_statistics.yaml
+```
+
+## Requirements
+
+- **Prometheus** metrics from your vLLM deployment
+- **Data source** configured in your monitoring platform
+- **vLLM metrics** enabled and accessible
+
+## Platform-Specific Documentation
+
+For detailed deployment instructions and platform-specific options, see:
+
+- **[Grafana Documentation](./grafana)** - JSON dashboards, operator usage, manual import
+- **[Perses Documentation](./perses)** - YAML specs, CLI usage, operator wrapping
+
+## Contributing
+
+When adding new dashboards, please:
+
+1. Provide native formats (JSON for Grafana, YAML specs for Perses)
+2. Update platform-specific README files
+3. Ensure dashboards work across deployment methods
+4. Test with the latest platform versions
--- a/examples/online_serving/dashboards/grafana/README.md
+++ b/examples/online_serving/dashboards/grafana/README.md
@ -0,0 +1,59 @@
+# Grafana Dashboards for vLLM Monitoring
+
+This directory contains Grafana dashboard configurations (as JSON) designed to monitor
+vLLM performance and metrics.
+
+## Requirements
+
+- Grafana 8.0+
+- Prometheus data source configured in Grafana
+- vLLM deployment with Prometheus metrics enabled
+
+## Dashboard Descriptions
+
+- **[performance_statistics.json](./performance_statistics.json)**: Tracks performance metrics including latency and
+  throughput for your vLLM service.
+- **[query_statistics.json](./query_statistics.json)**: Tracks query performance, request volume, and key
+  performance indicators for your vLLM service.
+
+## Deployment Options
+
+### Manual Import (Recommended)
+
+The easiest way to use these dashboards is to manually import the JSON configurations
+directly into your Grafana instance:
+
+1. Navigate to your Grafana instance
+2. Click the '+' icon in the sidebar
+3. Select 'Import'
+4. Copy and paste the JSON content from the dashboard files, or upload the JSON files
+   directly
+
+### Grafana Operator
+
+If you're using the [Grafana Operator](https://github.com/grafana-operator/grafana-operator)
+in Kubernetes, you can wrap these JSON configurations in a `GrafanaDashboard` custom
+resource:
+
+```yaml
+# Note: Adjust the instanceSelector to match your Grafana instance's labels
+# You can check with: kubectl get grafana -o yaml
+apiVersion: grafana.integreatly.org/v1beta1
+kind: GrafanaDashboard
+metadata:
+  name: vllm-performance-dashboard
+spec:
+  instanceSelector:
+    matchLabels:
+      dashboards: grafana  # Adjust to match your Grafana instance labels
+  folder: "vLLM Monitoring"
+  json: |
+    # Replace this comment with the complete JSON content from
+    # performance_statistics.json - The JSON should start with { and end with }
+```
+
+Then apply to your cluster:
+
+```bash
+kubectl apply -f your-dashboard.yaml -n <namespace>
+```
--- a/examples/online_serving/dashboards/grafana/performance_statistics.json
+++ b/examples/online_serving/dashboards/grafana/performance_statistics.json
--- a/examples/online_serving/dashboards/grafana/query_statistics.json
+++ b/examples/online_serving/dashboards/grafana/query_statistics.json
@ -0,0 +1,760 @@
+{
+  "annotations": {
+    "list": [
+      {
+        "builtIn": 1,
+        "datasource": {
+          "type": "grafana",
+          "uid": "-- Grafana --"
+        },
+        "enable": true,
+        "hide": true,
+        "iconColor": "rgba(0, 211, 255, 1)",
+        "name": "Annotations & Alerts",
+        "type": "dashboard"
+      }
+    ]
+  },
+  "description": "High-level overview of VLLM model deployment behavior and key performance indicators. Designed for Data Scientists and Product Managers to monitor request volume, token throughput, and latency",
+  "editable": true,
+  "fiscalYearStartMonth": 0,
+  "graphTooltip": 0,
+  "id": 47,
+  "links": [],
+  "panels": [
+    {
+      "collapsed": true,
+      "gridPos": { "h": 1, "w": 24, "x": 0, "y": 0 },
+      "id": 20,
+      "panels": [],
+      "title": "Request Over Time",
+      "type": "row"
+    },
+    {
+      "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
+      "fieldConfig": {
+        "defaults": {
+          "color": { "mode": "palette-classic" },
+          "custom": {
+            "axisBorderShow": false,
+            "axisCenteredZero": false,
+            "axisColorMode": "text",
+            "axisLabel": "",
+            "axisPlacement": "auto",
+            "barAlignment": 0,
+            "barWidthFactor": 0.6,
+            "drawStyle": "line",
+            "fillOpacity": 0,
+            "gradientMode": "none",
+            "hideFrom": { "legend": false, "tooltip": false, "viz": false },
+            "insertNulls": false,
+            "lineInterpolation": "linear",
+            "lineWidth": 1,
+            "pointSize": 5,
+            "scaleDistribution": { "type": "linear" },
+            "showPoints": "auto",
+            "spanNulls": false,
+            "stacking": { "group": "A", "mode": "none" },
+            "thresholdsStyle": { "mode": "off" }
+          },
+          "mappings": [],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [{ "color": "green", "value": null }, { "color": "red", "value": 80 }]
+          },
+          "unit": "req/s"
+        },
+        "overrides": []
+      },
+      "gridPos": { "h": 6, "w": 10, "x": 0, "y": 1 },
+      "id": 1,
+      "options": {
+        "legend": { "calcs": [], "displayMode": "list", "placement": "bottom", "showLegend": true },
+        "tooltip": { "mode": "single", "sort": "none" }
+      },
+      "pluginVersion": "11.3.0",
+      "targets": [
+        {
+          "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
+          "editorMode": "code",
+          "expr": "sum by (model_name) (\n  rate(vllm:request_success_total{model_name=~\"$Deployment_id\"}[$__rate_interval])\n)",
+          "interval": "1",
+          "legendFormat": "{{model_name}}",
+          "range": true,
+          "refId": "A"
+        }
+      ],
+      "title": "Successful Requests Over Time",
+      "type": "timeseries"
+    },
+    {
+      "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
+      "fieldConfig": {
+        "defaults": {
+          "color": { "mode": "thresholds" },
+          "mappings": [],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [{ "color": "green", "value": null }, { "color": "red", "value": 80 }]
+          },
+          "unit": "req/s"
+        },
+        "overrides": []
+      },
+      "gridPos": { "h": 3, "w": 7, "x": 10, "y": 1 },
+      "id": 2,
+      "options": {
+        "colorMode": "value",
+        "graphMode": "area",
+        "justifyMode": "auto",
+        "orientation": "auto",
+        "percentChangeColorMode": "standard",
+        "reduceOptions": { "calcs": ["mean"], "fields": "", "values": false },
+        "showPercentChange": false,
+        "textMode": "auto",
+        "wideLayout": true
+      },
+      "pluginVersion": "11.3.0",
+      "targets": [
+        {
+          "editorMode": "code",
+          "expr": "sum(rate(vllm:request_success_total{model_name=~\"$Deployment_id\"}[$__rate_interval]))",
+          "legendFormat": "__auto",
+          "range": true,
+          "refId": "A"
+        }
+      ],
+      "title": "Requests Avg Rate",
+      "type": "stat"
+    },
+    {
+      "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
+      "fieldConfig": {
+        "defaults": {
+          "color": { "mode": "thresholds" },
+          "mappings": [
+            { "options": { "Calcultaions": { "index": 0, "text": "Last (not null)" } }, "type": "value" }
+          ],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [{ "color": "green", "value": null }, { "color": "red", "value": 80 }]
+          },
+          "unit": "ms"
+        },
+        "overrides": []
+      },
+      "gridPos": { "h": 3, "w": 7, "x": 17, "y": 1 },
+      "id": 3,
+      "options": {
+        "colorMode": "value",
+        "graphMode": "area",
+        "justifyMode": "auto",
+        "orientation": "auto",
+        "percentChangeColorMode": "standard",
+        "reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
+        "showPercentChange": false,
+        "textMode": "auto",
+        "wideLayout": true
+      },
+      "pluginVersion": "11.3.0",
+      "targets": [
+        {
+          "editorMode": "code",
+          "expr": "histogram_quantile(0.50, sum by(le, model_name) (rate(vllm:e2e_request_latency_seconds_bucket{model_name=~\"$Deployment_id\"}[$__rate_interval])))",
+          "legendFormat": "__auto",
+          "range": true,
+          "refId": "A"
+        }
+      ],
+      "title": "p50 Latency",
+      "type": "stat"
+    },
+    {
+      "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
+      "fieldConfig": {
+        "defaults": {
+          "color": { "mode": "thresholds" },
+          "mappings": [
+            { "options": { "Calculation": { "index": 0, "text": "Last (not null)" } }, "type": "value" }
+          ],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [{ "color": "green", "value": null }, { "color": "red", "value": 80 }]
+          },
+          "unit": "ms"
+        },
+        "overrides": []
+      },
+      "gridPos": { "h": 3, "w": 7, "x": 10, "y": 4 },
+      "id": 4,
+      "options": {
+        "colorMode": "value",
+        "graphMode": "area",
+        "justifyMode": "auto",
+        "orientation": "auto",
+        "percentChangeColorMode": "standard",
+        "reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
+        "showPercentChange": false,
+        "textMode": "auto",
+        "wideLayout": true
+      },
+      "pluginVersion": "11.3.0",
+      "targets": [
+        {
+          "editorMode": "code",
+          "expr": "histogram_quantile(0.90, sum by(le, model_name) (rate(vllm:e2e_request_latency_seconds_bucket{model_name=~\"$Deployment_id\"}[$__rate_interval])))",
+          "legendFormat": "__auto",
+          "range": true,
+          "refId": "A"
+        }
+      ],
+      "title": "p90 Latency",
+      "type": "stat"
+    },
+    {
+      "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
+      "fieldConfig": {
+        "defaults": {
+          "color": { "mode": "thresholds" },
+          "mappings": [
+            { "options": { "Calculation": { "index": 0, "text": "Last (not null)" } }, "type": "value" }
+          ],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [{ "color": "green", "value": null }, { "color": "red", "value": 80 }]
+          },
+          "unit": "ms"
+        },
+        "overrides": []
+      },
+      "gridPos": { "h": 3, "w": 7, "x": 17, "y": 4 },
+      "id": 5,
+      "options": {
+        "colorMode": "value",
+        "graphMode": "area",
+        "justifyMode": "auto",
+        "orientation": "auto",
+        "percentChangeColorMode": "standard",
+        "reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
+        "showPercentChange": false,
+        "textMode": "auto",
+        "wideLayout": true
+      },
+      "pluginVersion": "11.3.0",
+      "targets": [
+        {
+          "editorMode": "code",
+          "expr": "histogram_quantile(0.99, sum by(le, model_name) (rate(vllm:e2e_request_latency_seconds_bucket{model_name=~\"$Deployment_id\"}[$__rate_interval])))",
+          "legendFormat": "__auto",
+          "range": true,
+          "refId": "A"
+        }
+      ],
+      "title": "p99 Latency",
+      "type": "stat"
+    },
+    {
+      "collapsed": false,
+      "gridPos": { "h": 1, "w": 24, "x": 0, "y": 7 },
+      "id": 19,
+      "panels": [],
+      "title": "Size Distribution",
+      "type": "row"
+    },
+    {
+      "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
+      "fieldConfig": {
+        "defaults": {
+          "color": { "mode": "palette-classic" },
+          "custom": {
+            "fillOpacity": 80,
+            "gradientMode": "none",
+            "hideFrom": { "legend": false, "tooltip": false, "viz": false },
+            "lineWidth": 1,
+            "stacking": { "group": "A", "mode": "none" }
+          },
+          "mappings": [],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [{ "color": "green", "value": null }, { "color": "red", "value": 80 }]
+          },
+          "unit": "cps"
+        },
+        "overrides": []
+      },
+      "gridPos": { "h": 6, "w": 10, "x": 0, "y": 8 },
+      "id": 6,
+      "options": {
+        "legend": { "calcs": [], "displayMode": "list", "placement": "bottom", "showLegend": true },
+        "tooltip": { "mode": "single", "sort": "none" }
+      },
+      "pluginVersion": "11.3.0",
+      "targets": [
+        {
+          "editorMode": "code",
+          "expr": "sum by (le, model_name) (rate(vllm:request_prompt_tokens_bucket{model_name=~\"$Deployment_id\"}[$__rate_interval]))",
+          "legendFormat": "{{model_name}} le={{le}}",
+          "range": true,
+          "refId": "A"
+        }
+      ],
+      "title": "Input Token Size Distribution",
+      "type": "histogram"
+    },
+    {
+      "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
+      "fieldConfig": {
+        "defaults": {
+          "color": { "mode": "thresholds" },
+          "mappings": [
+            { "options": { "calculation ": { "index": 0, "text": "Last (not null)" } }, "type": "value" }
+          ],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [{ "color": "green", "value": null }, { "color": "red", "value": 80 }]
+          },
+          "unit": "cps"
+        },
+        "overrides": []
+      },
+      "gridPos": { "h": 3, "w": 7, "x": 10, "y": 8 },
+      "id": 9,
+      "options": {
+        "colorMode": "value",
+        "graphMode": "area",
+        "justifyMode": "auto",
+        "orientation": "auto",
+        "percentChangeColorMode": "standard",
+        "reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
+        "showPercentChange": false,
+        "textMode": "auto",
+        "wideLayout": true
+      },
+      "pluginVersion": "11.3.0",
+      "targets": [
+        {
+          "editorMode": "code",
+          "expr": "histogram_quantile(0.90, sum by(le, model_name) (rate(vllm:request_prompt_tokens_bucket{model_name=~\"$Deployment_id\"}[$__rate_interval])))",
+          "legendFormat": "__auto",
+          "range": true,
+          "refId": "A"
+        }
+      ],
+      "title": "Input Token Size p90",
+      "type": "stat"
+    },
+    {
+      "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
+      "fieldConfig": {
+        "defaults": {
+          "color": { "mode": "thresholds" },
+          "mappings": [
+            { "options": { "Calcultion": { "index": 0, "text": "Last (not null)" } }, "type": "value" }
+          ],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [{ "color": "green", "value": null }, { "color": "red", "value": 80 }]
+          },
+          "unit": "cps"
+        },
+        "overrides": []
+      },
+      "gridPos": { "h": 3, "w": 7, "x": 17, "y": 8 },
+      "id": 8,
+      "options": {
+        "colorMode": "value",
+        "graphMode": "area",
+        "justifyMode": "auto",
+        "orientation": "auto",
+        "percentChangeColorMode": "standard",
+        "reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
+        "showPercentChange": false,
+        "textMode": "auto",
+        "wideLayout": true
+      },
+      "pluginVersion": "11.3.0",
+      "targets": [
+        {
+          "editorMode": "code",
+          "expr": "histogram_quantile(0.50, sum by(le, model_name) (rate(vllm:request_prompt_tokens_bucket{model_name=~\"$Deployment_id\"}[$__rate_interval])))",
+          "legendFormat": "__auto",
+          "range": true,
+          "refId": "A"
+        }
+      ],
+      "title": "Input Token Size p50",
+      "type": "stat"
+    },
+    {
+      "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
+      "fieldConfig": {
+        "defaults": {
+          "color": { "mode": "thresholds" },
+          "mappings": [
+            { "options": { "Calcultaion": { "index": 0, "text": "mean" } }, "type": "value" }
+          ],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [{ "color": "green", "value": null }, { "color": "red", "value": 80 }]
+          },
+          "unit": "cps"
+        },
+        "overrides": []
+      },
+      "gridPos": { "h": 3, "w": 7, "x": 10, "y": 11 },
+      "id": 7,
+      "options": {
+        "colorMode": "value",
+        "graphMode": "area",
+        "justifyMode": "auto",
+        "orientation": "auto",
+        "percentChangeColorMode": "standard",
+        "reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
+        "showPercentChange": false,
+        "textMode": "auto",
+        "wideLayout": true
+      },
+      "pluginVersion": "11.3.0",
+      "targets": [
+        {
+          "editorMode": "code",
+          "expr": "sum(rate(vllm:prompt_tokens_total{model_name=~\"$Deployment_id\"}[$__rate_interval]))\n/\nsum(rate(vllm:request_success_total{model_name=~\"$Deployment_id\"}[$__rate_interval]))",
+          "legendFormat": "__auto",
+          "range": true,
+          "refId": "A"
+        }
+      ],
+      "title": "Input Token Size Avg",
+      "type": "stat"
+    },
+    {
+      "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
+      "fieldConfig": {
+        "defaults": {
+          "color": { "mode": "thresholds" },
+          "mappings": [
+            { "options": { "Calculation": { "index": 0, "text": "Last (not null)" } }, "type": "value" }
+          ],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [{ "color": "green", "value": null }, { "color": "red", "value": 80 }]
+          },
+          "unit": "cps"
+        },
+        "overrides": []
+      },
+      "gridPos": { "h": 3, "w": 7, "x": 17, "y": 11 },
+      "id": 10,
+      "options": {
+        "colorMode": "value",
+        "graphMode": "area",
+        "justifyMode": "auto",
+        "orientation": "auto",
+        "percentChangeColorMode": "standard",
+        "reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
+        "showPercentChange": false,
+        "textMode": "auto",
+        "wideLayout": true
+      },
+      "pluginVersion": "11.3.0",
+      "targets": [
+        {
+          "editorMode": "code",
+          "expr": "histogram_quantile(0.99, sum by(le, model_name) (rate(vllm:request_prompt_tokens_bucket{model_name=~\"$Deployment_id\"}[$__rate_interval])))",
+          "legendFormat": "__auto",
+          "range": true,
+          "refId": "A"
+        }
+      ],
+      "title": "Input Token Size p99",
+      "type": "stat"
+    },
+    {
+      "collapsed": true,
+      "gridPos": { "h": 1, "w": 24, "x": 0, "y": 14 },
+      "id": 18,
+      "panels": [],
+      "title": "Input Token Over Time",
+      "type": "row"
+    },
+    {
+      "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
+      "fieldConfig": {
+        "defaults": {
+          "color": { "mode": "palette-classic" },
+          "custom": {
+            "axisBorderShow": false,
+            "axisCenteredZero": false,
+            "axisColorMode": "text",
+            "axisLabel": "",
+            "axisPlacement": "auto",
+            "barAlignment": 0,
+            "barWidthFactor": 0.6,
+            "drawStyle": "line",
+            "fillOpacity": 0,
+            "gradientMode": "none",
+            "hideFrom": { "legend": false, "tooltip": false, "viz": false },
+            "insertNulls": false,
+            "lineInterpolation": "linear",
+            "lineWidth": 1,
+            "pointSize": 5,
+            "scaleDistribution": { "type": "linear" },
+            "showPoints": "auto",
+            "spanNulls": false,
+            "stacking": { "group": "A", "mode": "none" },
+            "thresholdsStyle": { "mode": "off" }
+          },
+          "mappings": [],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [{ "color": "green", "value": null }, { "color": "red", "value": 80 }]
+          },
+          "unit": "cps"
+        },
+        "overrides": []
+      },
+      "gridPos": { "h": 6, "w": 10, "x": 0, "y": 15 },
+      "id": 11,
+      "options": {
+        "legend": { "calcs": [], "displayMode": "list", "placement": "bottom", "showLegend": true },
+        "tooltip": { "mode": "single", "sort": "none" }
+      },
+      "pluginVersion": "11.3.0",
+      "targets": [
+        {
+          "editorMode": "code",
+          "expr": "sum by (model_name) (rate(vllm:prompt_tokens_total{model_name=~\"$Deployment_id\"}[$__rate_interval]))",
+          "legendFormat": "{{model_name}}",
+          "range": true,
+          "refId": "A"
+        }
+      ],
+      "title": "Input Tokens Over Time",
+      "type": "timeseries"
+    },
+    {
+      "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
+      "fieldConfig": {
+        "defaults": {
+          "color": { "mode": "thresholds" },
+          "mappings": [
+            { "options": { "Calculation": { "index": 0, "text": "mean" } }, "type": "value" }
+          ],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [{ "color": "green", "value": null }, { "color": "red", "value": 80 }]
+          },
+          "unit": "cps"
+        },
+        "overrides": []
+      },
+      "gridPos": { "h": 3, "w": 7, "x": 10, "y": 15 },
+      "id": 12,
+      "options": {
+        "colorMode": "value",
+        "graphMode": "area",
+        "justifyMode": "auto",
+        "orientation": "auto",
+        "percentChangeColorMode": "standard",
+        "reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
+        "showPercentChange": false,
+        "textMode": "auto",
+        "wideLayout": true
+      },
+      "pluginVersion": "11.3.0",
+      "targets": [
+        {
+          "editorMode": "code",
+          "expr": "sum(rate(vllm:prompt_tokens_total{model_name=~\"$Deployment_id\"}[$__rate_interval]))",
+          "legendFormat": "__auto",
+          "range": true,
+          "refId": "A"
+        }
+      ],
+      "title": "Input Tokens/Sec Avg",
+      "type": "stat"
+    },
+    {
+      "collapsed": false,
+      "gridPos": { "h": 1, "w": 24, "x": 0, "y": 21 },
+      "id": 17,
+      "panels": [],
+      "title": "Output Token Over Time",
+      "type": "row"
+    },
+    {
+      "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
+      "fieldConfig": {
+        "defaults": {
+          "color": { "mode": "palette-classic" },
+          "custom": {
+            "axisBorderShow": false,
+            "axisCenteredZero": false,
+            "axisColorMode": "text",
+            "axisLabel": "",
+            "axisPlacement": "auto",
+            "barAlignment": 0,
+            "barWidthFactor": 0.6,
+            "drawStyle": "line",
+            "fillOpacity": 0,
+            "gradientMode": "none",
+            "hideFrom": { "legend": false, "tooltip": false, "viz": false },
+            "insertNulls": false,
+            "lineInterpolation": "linear",
+            "lineWidth": 1,
+            "pointSize": 5,
+            "scaleDistribution": { "type": "linear" },
+            "showPoints": "auto",
+            "spanNulls": false,
+            "stacking": { "group": "A", "mode": "none" },
+            "thresholdsStyle": { "mode": "off" }
+          },
+          "mappings": [],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [{ "color": "green", "value": null }, { "color": "red", "value": 80 }]
+          },
+          "unit": "cps"
+        },
+        "overrides": []
+      },
+      "gridPos": { "h": 6, "w": 10, "x": 0, "y": 22 },
+      "id": 13,
+      "options": {
+        "legend": { "calcs": [], "displayMode": "list", "placement": "bottom", "showLegend": true },
+        "tooltip": { "mode": "single", "sort": "none" }
+      },
+      "pluginVersion": "11.3.0",
+      "targets": [
+        {
+          "editorMode": "code",
+          "expr": "sum by (model_name) (rate(vllm:generation_tokens_total{model_name=~\"$Deployment_id\"}[$__rate_interval]))",
+          "legendFormat": "{{model_name}}",
+          "range": true,
+          "refId": "A"
+        }
+      ],
+      "title": "Output Tokens Over Time",
+      "type": "timeseries"
+    },
+    {
+      "datasource": { "type": "prometheus", "uid": "${DS_PROMETHEUS}" },
+      "fieldConfig": {
+        "defaults": {
+          "color": { "mode": "thresholds" },
+          "mappings": [
+            { "options": { "Calculation": { "index": 0, "text": "mean" } }, "type": "value" }
+          ],
+          "thresholds": {
+            "mode": "absolute",
+            "steps": [{ "color": "green", "value": null }, { "color": "red", "value": 80 }]
+          },
+          "unit": "cps"
+        },
+        "overrides": []
+      },
+      "gridPos": { "h": 3, "w": 7, "x": 10, "y": 22 },
+      "id": 14,
+      "options": {
+        "colorMode": "value",
+        "graphMode": "area",
+        "justifyMode": "auto",
+        "orientation": "auto",
+        "percentChangeColorMode": "standard",
+        "reduceOptions": { "calcs": ["lastNotNull"], "fields": "", "values": false },
+        "showPercentChange": false,
+        "textMode": "auto",
+        "wideLayout": true
+      },
+      "pluginVersion": "11.3.0",
+      "targets": [
+        {
+          "editorMode": "code",
+          "expr": "sum(rate(vllm:generation_tokens_total{model_name=~\"$Deployment_id\"}[$__rate_interval]))",
+          "legendFormat": "__auto",
+          "range": true,
+          "refId": "A"
+        }
+      ],
+      "title": "Output Tokens/Sec Avg",
+      "type": "stat"
+    }
+  ],
+  "preload": false,
+  "schemaVersion": 40,
+  "tags": [],
+  "templating": {
+    "list": [
+      {
+        "current": { "text": "Prometheus", "value": "4184fc20-68a7-483a-8d9b-7caa59c680dd" },
+        "label": "datasource",
+        "name": "DS_PROMETHEUS",
+        "options": [],
+        "query": "prometheus",
+        "refresh": 1,
+        "type": "datasource"
+      },
+      {
+        "current": { "text": ["All"], "value": ["$__all"] },
+        "definition": "label_values(vllm:request_success_total,model_name)",
+        "includeAll": true,
+        "label": "Deployment_ID",
+        "multi": true,
+        "name": "Deployment_id",
+        "options": [],
+        "query": {
+          "qryType": 1,
+          "query": "label_values(vllm:request_success_total,model_name)",
+          "refId": "PrometheusVariableQueryEditor-VariableQuery"
+        },
+        "refresh": 1,
+        "regex": "",
+        "sort": 1,
+        "type": "query"
+      },
+      {
+        "current": { "text": "All hours", "value": "All hours" },
+        "hide": 2,
+        "label": "Rush Hours Only",
+        "name": "rush_hours",
+        "options": [
+          { "selected": true, "text": "false", "value": "All hours" },
+          { "selected": false, "text": "true", "value": "Rush hours" }
+        ],
+        "query": "false : All hours, true : Rush hours",
+        "type": "custom"
+      },
+      {
+        "current": { "text": "All", "value": "All" },
+        "hide": 2,
+        "label": "Rush Hours Type",
+        "name": "rush_hours_type",
+        "options": [
+          { "selected": true, "text": "^All__.*$", "value": "All" },
+          { "selected": false, "text": "^Static__.*$", "value": "Static" },
+          { "selected": false, "text": "^Dynamic__.*$", "value": "Dynamic" }
+        ],
+        "query": "^All__.*$ : All, ^Static__.*$ : Static, ^Dynamic__.*$ : Dynamic",
+        "type": "custom"
+      },
+      {
+        "current": { "text": "", "value": "" },
+        "hide": 2,
+        "name": "query0",
+        "options": [],
+        "query": "",
+        "refresh": 1,
+        "regex": "",
+        "type": "query"
+      }
+    ]
+  },
+  "time": { "from": "now-12h", "to": "now" },
+  "timepicker": {},
+  "timezone": "browser",
+  "title": "Query Statistics_New4",
+  "uid": "query-statistics4",
+  "version": 2,
+  "weekStart": ""
+}
+
--- a/examples/online_serving/dashboards/perses/README.md
+++ b/examples/online_serving/dashboards/perses/README.md
@ -0,0 +1,48 @@
+# Perses Dashboards for vLLM Monitoring
+
+This directory contains Perses dashboard configurations designed to monitor vLLM
+performance and metrics.
+
+## Requirements
+
+- Perses instance (standalone or via operator)
+- Prometheus data source configured in Perses
+- vLLM deployment with Prometheus metrics enabled
+
+## Dashboard Format
+
+We provide dashboards in the **native Perses YAML format** that works across all
+deployment methods:
+
+- **Files**: `*.yaml` (native Perses dashboard specifications)
+- **Format**: Pure dashboard specifications that work everywhere
+- **Usage**: Works with standalone Perses, API imports, CLI, and file provisioning
+- **Kubernetes**: Directly compatible with Perses Operator
+
+## Dashboard Descriptions
+
+- **[performance_statistics.yaml](./performance_statistics.yaml)**: Performance metrics with aggregated latency
+  statistics
+- **[query_statistics.yaml](./query_statistics.yaml)**: Query performance and deployment metrics
+
+## Deployment Options
+
+### Direct Import to Perses
+
+Import the dashboard specifications via Perses API or CLI:
+
+```bash
+percli apply -f performance_statistics.yaml
+```
+
+### Perses Operator (Kubernetes)
+
+The native YAML format works directly with the Perses Operator:
+
+```bash
+kubectl apply -f performance_statistics.yaml -n <namespace>
+```
+
+### File Provisioning
+
+Place the YAML files in a Perses provisioning folder for automatic loading.
--- a/examples/online_serving/dashboards/perses/performance_statistics.yaml
+++ b/examples/online_serving/dashboards/perses/performance_statistics.yaml
@ -0,0 +1,764 @@
+kind: PersesDashboard
+metadata:
+  name: performance-statistics
+  createdAt: 0001-01-01T00:00:00Z
+  updatedAt: 0001-01-01T00:00:00Z
+  version: 0
+  project: ""
+spec:
+  display:
+    name: Performance Statistics
+
+  variables:
+    - kind: ListVariable
+      spec:
+        display:
+          name: Deployment_ID
+          hidden: false
+        name: Deployment_id
+        allowAllValue: true
+        allowMultiple: true
+        defaultValue:
+          - $__all
+        sort: alphabetical-asc
+        plugin:
+          kind: PrometheusLabelValuesVariable
+          spec:
+            datasource:
+              kind: PrometheusDatasource
+              name: accelerators-thanos-querier-datasource
+            labelName: model_name
+            matchers:
+              # Any one vllm metric that always carries model_name
+              - vllm:generation_tokens_total{}
+
+  panels:
+    "1":
+      kind: Panel
+      spec:
+        display:
+          name: E2E Latency over Time
+        plugin:
+          kind: TimeSeriesChart
+          spec:
+            legend:
+              mode: table
+              position: bottom
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource:
+                    kind: PrometheusDatasource
+                    name: accelerators-thanos-querier-datasource
+                  # avg latency by model = sum(rate(sum)) / sum(rate(count))
+                  query: >
+                    sum by (model_name) (rate(vllm:e2e_request_latency_seconds_sum{model_name=~"$Deployment_id"}[$__interval]))
+                    /
+                    sum by (model_name) (rate(vllm:e2e_request_latency_seconds_count{model_name=~"$Deployment_id"}[$__interval]))
+                  seriesNameFormat: '{{model_name}}'
+
+    "2":
+      kind: Panel
+      spec:
+        display:
+          name: E2E Latency (Avg)
+        plugin:
+          kind: StatChart
+          spec:
+            calculation: last-number
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource:
+                    kind: PrometheusDatasource
+                    name: accelerators-thanos-querier-datasource
+                  query: >
+                    (sum by (model_name) (increase(vllm:e2e_request_latency_seconds_sum{model_name=~"$Deployment_id"}[$__range])))
+                    /
+                    (sum by (model_name) (increase(vllm:e2e_request_latency_seconds_count{model_name=~"$Deployment_id"}[$__range])))
+
+    "3":
+      kind: Panel
+      spec:
+        display:
+          name: E2E Latency (P50)
+        plugin:
+          kind: StatChart
+          spec:
+            calculation: last-number
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource:
+                    kind: PrometheusDatasource
+                    name: accelerators-thanos-querier-datasource
+                  query: >
+                    histogram_quantile(
+                      0.50,
+                      sum by (le, model_name) (
+                        rate(vllm:e2e_request_latency_seconds_bucket{model_name=~"$Deployment_id"}[$__interval])
+                      )
+                    )
+
+    "4":
+      kind: Panel
+      spec:
+        display:
+          name: E2E Latency (P90)
+        plugin:
+          kind: StatChart
+          spec:
+            calculation: last-number
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource:
+                    kind: PrometheusDatasource
+                    name: accelerators-thanos-querier-datasource
+                  query: >
+                    histogram_quantile(
+                      0.90,
+                      sum by (le, model_name) (
+                        rate(vllm:e2e_request_latency_seconds_bucket{model_name=~"$Deployment_id"}[$__interval])
+                      )
+                    )
+
+    "5":
+      kind: Panel
+      spec:
+        display:
+          name: E2E Latency (P99)
+        plugin:
+          kind: StatChart
+          spec:
+            calculation: last-number
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource:
+                    kind: PrometheusDatasource
+                    name: accelerators-thanos-querier-datasource
+                  query: >
+                    histogram_quantile(
+                      0.99,
+                      sum by (le, model_name) (
+                        rate(vllm:e2e_request_latency_seconds_bucket{model_name=~"$Deployment_id"}[$__interval])
+                      )
+                    )
+
+    "6":
+      kind: Panel
+      spec:
+        display:
+          name: TTFT over Time
+        plugin:
+          kind: TimeSeriesChart
+          spec:
+            legend:
+              mode: table
+              position: bottom
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource:
+                    kind: PrometheusDatasource
+                    name: accelerators-thanos-querier-datasource
+                  query: >
+                    sum by (model_name) (rate(vllm:time_to_first_token_seconds_sum{model_name=~"$Deployment_id"}[$__interval]))
+                    /
+                    sum by (model_name) (rate(vllm:time_to_first_token_seconds_count{model_name=~"$Deployment_id"}[$__interval]))
+                  seriesNameFormat: '{{model_name}}'
+
+    "7":
+      kind: Panel
+      spec:
+        display:
+          name: TTFT (Avg)
+        plugin:
+          kind: StatChart
+          spec:
+            calculation: last-number
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource:
+                    kind: PrometheusDatasource
+                    name: accelerators-thanos-querier-datasource
+                  query: >
+                    (sum by (model_name) (increase(vllm:time_to_first_token_seconds_sum{model_name=~"$Deployment_id"}[$__range])))
+                    /
+                    (sum by (model_name) (increase(vllm:time_to_first_token_seconds_count{model_name=~"$Deployment_id"}[$__range])))
+
+    "8":
+      kind: Panel
+      spec:
+        display:
+          name: TTFT (P50)
+        plugin:
+          kind: StatChart
+          spec:
+            calculation: last-number
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource:
+                    kind: PrometheusDatasource
+                    name: accelerators-thanos-querier-datasource
+                  query: >
+                    histogram_quantile(
+                      0.50,
+                      sum by (le, model_name) (
+                        rate(vllm:time_to_first_token_seconds_bucket{model_name=~"$Deployment_id"}[$__interval])
+                      )
+                    )
+
+    "9":
+      kind: Panel
+      spec:
+        display:
+          name: TTFT (P90)
+        plugin:
+          kind: StatChart
+          spec:
+            calculation: last-number
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource:
+                    kind: PrometheusDatasource
+                    name: accelerators-thanos-querier-datasource
+                  query: >
+                    histogram_quantile(
+                      0.90,
+                      sum by (le, model_name) (
+                        rate(vllm:time_to_first_token_seconds_bucket{model_name=~"$Deployment_id"}[$__interval])
+                      )
+                    )
+
+    "10":
+      kind: Panel
+      spec:
+        display:
+          name: TTFT (P99)
+        plugin:
+          kind: StatChart
+          spec:
+            calculation: last-number
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource:
+                    kind: PrometheusDatasource
+                    name: accelerators-thanos-querier-datasource
+                  query: >
+                    histogram_quantile(
+                      0.99,
+                      sum by (le, model_name) (
+                        rate(vllm:time_to_first_token_seconds_bucket{model_name=~"$Deployment_id"}[$__interval])
+                      )
+                    )
+
+    "11":
+      kind: Panel
+      spec:
+        display:
+          name: ITL (Time per Output Token) over Time
+        plugin:
+          kind: TimeSeriesChart
+          spec:
+            legend:
+              mode: table
+              position: bottom
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource:
+                    kind: PrometheusDatasource
+                    name: accelerators-thanos-querier-datasource
+                  query: >
+                    sum by (model_name) (rate(vllm:time_per_output_token_seconds_sum{model_name=~"$Deployment_id"}[$__interval]))
+                    /
+                    sum by (model_name) (rate(vllm:time_per_output_token_seconds_count{model_name=~"$Deployment_id"}[$__interval]))
+                  seriesNameFormat: '{{model_name}}'
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource:
+                    kind: PrometheusDatasource
+                    name: accelerators-thanos-querier-datasource
+                  query: >
+                    histogram_quantile(
+                      0.50,
+                      sum by (le, model_name) (
+                        rate(vllm:time_per_output_token_seconds_bucket{model_name=~"$Deployment_id"}[$__interval])
+                      )
+                    )
+                  seriesNameFormat: '{{model_name}} p50'
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource:
+                    kind: PrometheusDatasource
+                    name: accelerators-thanos-querier-datasource
+                  query: >
+                    histogram_quantile(
+                      0.90,
+                      sum by (le, model_name) (
+                        rate(vllm:time_per_output_token_seconds_bucket{model_name=~"$Deployment_id"}[$__interval])
+                      )
+                    )
+                  seriesNameFormat: '{{model_name}} p90'
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource:
+                    kind: PrometheusDatasource
+                    name: accelerators-thanos-querier-datasource
+                  query: >
+                    histogram_quantile(
+                      0.99,
+                      sum by (le, model_name) (
+                        rate(vllm:time_per_output_token_seconds_bucket{model_name=~"$Deployment_id"}[$__interval])
+                      )
+                    )
+                  seriesNameFormat: '{{model_name}} p99'
+
+    "12":
+      kind: Panel
+      spec:
+        display:
+          name: ITL (Avg)
+        plugin:
+          kind: StatChart
+          spec:
+            calculation: last-number
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource:
+                    kind: PrometheusDatasource
+                    name: accelerators-thanos-querier-datasource
+                  query: >
+                    (sum by (model_name) (increase(vllm:time_per_output_token_seconds_sum{model_name=~"$Deployment_id"}[$__range])))
+                    /
+                    (sum by (model_name) (increase(vllm:time_per_output_token_seconds_count{model_name=~"$Deployment_id"}[$__range])))
+
+    "13":
+      kind: Panel
+      spec:
+        display:
+          name: ITL (P50)
+        plugin:
+          kind: StatChart
+          spec:
+            calculation: last-number
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource:
+                    kind: PrometheusDatasource
+                    name: accelerators-thanos-querier-datasource
+                  query: >
+                    histogram_quantile(
+                      0.50,
+                      sum by (le, model_name) (
+                        rate(vllm:time_per_output_token_seconds_bucket{model_name=~"$Deployment_id"}[$__interval])
+                      )
+                    )
+
+    "14":
+      kind: Panel
+      spec:
+        display:
+          name: ITL (P90)
+        plugin:
+          kind: StatChart
+          spec:
+            calculation: last-number
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource:
+                    kind: PrometheusDatasource
+                    name: accelerators-thanos-querier-datasource
+                  query: >
+                    histogram_quantile(
+                      0.90,
+                      sum by (le, model_name) (
+                        rate(vllm:time_per_output_token_seconds_bucket{model_name=~"$Deployment_id"}[$__interval])
+                      )
+                    )
+
+    "15":
+      kind: Panel
+      spec:
+        display:
+          name: ITL (P99)
+        plugin:
+          kind: StatChart
+          spec:
+            calculation: last-number
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource:
+                    kind: PrometheusDatasource
+                    name: accelerators-thanos-querier-datasource
+                  query: >
+                    histogram_quantile(
+                      0.99,
+                      sum by (le, model_name) (
+                        rate(vllm:time_per_output_token_seconds_bucket{model_name=~"$Deployment_id"}[$__interval])
+                      )
+                    )
+
+    "16":
+      kind: Panel
+      spec:
+        display:
+          name: TPS (Tokens/sec) over Time
+        plugin:
+          kind: TimeSeriesChart
+          spec:
+            legend:
+              mode: table
+              position: bottom
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource:
+                    kind: PrometheusDatasource
+                    name: accelerators-thanos-querier-datasource
+                  query: >
+                    sum by (model_name) (rate(vllm:generation_tokens_total{model_name=~"$Deployment_id"}[$__interval]))
+                  seriesNameFormat: '{{model_name}} generation'
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource:
+                    kind: PrometheusDatasource
+                    name: accelerators-thanos-querier-datasource
+                  query: >
+                    sum by (model_name) (rate(vllm:prompt_tokens_total{model_name=~"$Deployment_id"}[$__interval]))
+                  seriesNameFormat: '{{model_name}} prompt'
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource:
+                    kind: PrometheusDatasource
+                    name: accelerators-thanos-querier-datasource
+                  # overall iteration tokens/sec if exposed
+                  query: >
+                    rate(vllm:iteration_tokens_total_count[$__interval])
+                  seriesNameFormat: 'iteration overall'
+
+    "17":
+      kind: Panel
+      spec:
+        display:
+          name: KV Cache Usage (avg %)
+        plugin:
+          kind: StatChart
+          spec:
+            calculation: last-number
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource:
+                    kind: PrometheusDatasource
+                    name: accelerators-thanos-querier-datasource
+                  # Multiply by 100 so we can read it as a percentage without setting a unit (avoids CUE unit conflicts)
+                  query: >
+                    100 * avg(vllm:gpu_cache_usage_perc)
+
+    "18":
+      kind: Panel
+      spec:
+        display:
+          name: Running Requests by Pod
+        plugin:
+          kind: TimeSeriesChart
+          spec:
+            legend:
+              mode: table
+              position: bottom
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource:
+                    kind: PrometheusDatasource
+                    name: accelerators-thanos-querier-datasource
+                  query: >
+                    sum by (pod) (vllm:num_requests_running)
+                  seriesNameFormat: '{{pod}}'
+
+    "19":
+      kind: Panel
+      spec:
+        display:
+          name: Waiting Requests by Pod
+        plugin:
+          kind: TimeSeriesChart
+          spec:
+            legend:
+              mode: table
+              position: bottom
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource:
+                    kind: PrometheusDatasource
+                    name: accelerators-thanos-querier-datasource
+                  query: >
+                    sum by (pod) (vllm:num_requests_waiting)
+                  seriesNameFormat: '{{pod}}'
+
+    "20":
+      kind: Panel
+      spec:
+        display:
+          name: Running Requests (sum)
+        plugin:
+          kind: StatChart
+          spec:
+            calculation: last-number
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource:
+                    kind: PrometheusDatasource
+                    name: accelerators-thanos-querier-datasource
+                  query: sum(vllm:num_requests_running)
+
+    "21":
+      kind: Panel
+      spec:
+        display:
+          name: Waiting Requests (sum)
+        plugin:
+          kind: StatChart
+          spec:
+            calculation: last-number
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource:
+                    kind: PrometheusDatasource
+                    name: accelerators-thanos-querier-datasource
+                  query: sum(vllm:num_requests_waiting)
+
+  layouts:
+    - kind: Grid
+      spec:
+        display:
+          title: Overview
+        items:
+          - x: 0
+            y: 0
+            width: 6
+            height: 3
+            content: { $ref: '#/spec/panels/17' }   # KV cache %
+          - x: 6
+            y: 0
+            width: 6
+            height: 3
+            content: { $ref: '#/spec/panels/20' }   # running sum
+          - x: 12
+            y: 0
+            width: 6
+            height: 3
+            content: { $ref: '#/spec/panels/21' }   # waiting sum
+
+    - kind: Grid
+      spec:
+        display:
+          title: E2E Latency
+        items:
+          - x: 0
+            y: 1
+            width: 10
+            height: 6
+            content: { $ref: '#/spec/panels/1' }
+          - x: 10
+            y: 1
+            width: 7
+            height: 3
+            content: { $ref: '#/spec/panels/2' }
+          - x: 17
+            y: 1
+            width: 7
+            height: 3
+            content: { $ref: '#/spec/panels/3' }
+          - x: 10
+            y: 4
+            width: 7
+            height: 3
+            content: { $ref: '#/spec/panels/4' }
+          - x: 17
+            y: 4
+            width: 7
+            height: 3
+            content: { $ref: '#/spec/panels/5' }
+
+    - kind: Grid
+      spec:
+        display:
+          title: TTFT
+        items:
+          - x: 0
+            y: 8
+            width: 10
+            height: 6
+            content: { $ref: '#/spec/panels/6' }
+          - x: 10
+            y: 8
+            width: 7
+            height: 3
+            content: { $ref: '#/spec/panels/7' }
+          - x: 17
+            y: 8
+            width: 7
+            height: 3
+            content: { $ref: '#/spec/panels/8' }
+          - x: 10
+            y: 11
+            width: 7
+            height: 3
+            content: { $ref: '#/spec/panels/9' }
+          - x: 17
+            y: 11
+            width: 7
+            height: 3
+            content: { $ref: '#/spec/panels/10' }
+
+    - kind: Grid
+      spec:
+        display:
+          title: ITL (Time per Output Token)
+        items:
+          - x: 0
+            y: 15
+            width: 10
+            height: 6
+            content: { $ref: '#/spec/panels/11' }
+          - x: 10
+            y: 15
+            width: 7
+            height: 3
+            content: { $ref: '#/spec/panels/12' }
+          - x: 17
+            y: 15
+            width: 7
+            height: 3
+            content: { $ref: '#/spec/panels/13' }
+          - x: 10
+            y: 18
+            width: 7
+            height: 3
+            content: { $ref: '#/spec/panels/14' }
+          - x: 17
+            y: 18
+            width: 7
+            height: 3
+            content: { $ref: '#/spec/panels/15' }
+
+    - kind: Grid
+      spec:
+        display:
+          title: TPS (Prompt / Generation / Iteration)
+        items:
+          - x: 0
+            y: 22
+            width: 14
+            height: 6
+            content: { $ref: '#/spec/panels/16' }
+
+    - kind: Grid
+      spec:
+        display:
+          title: Per-Pod Request State
+        items:
+          - x: 0
+            y: 28
+            width: 12
+            height: 6
+            content: { $ref: '#/spec/panels/18' }
+          - x: 12
+            y: 28
+            width: 12
+            height: 6
+            content: { $ref: '#/spec/panels/19' }
+
--- a/examples/online_serving/dashboards/perses/query_statistics.yaml
+++ b/examples/online_serving/dashboards/perses/query_statistics.yaml
@ -0,0 +1,392 @@
+kind: PersesDashboard
+metadata:
+  name: query-statistics
+  createdAt: 0001-01-01T00:00:00Z
+  updatedAt: 0001-01-01T00:00:00Z
+  version: 0
+  project: ""
+spec:
+  display:
+    name: Query Statistics_New
+
+  variables:
+    - kind: ListVariable
+      spec:
+        name: NS
+        display: { name: Namespace }
+        allowMultiple: false
+        defaultValue: llm-d
+        plugin:
+          kind: PrometheusLabelValuesVariable
+          spec:
+            datasource: { kind: PrometheusDatasource, name: accelerators-thanos-querier-datasource }
+            labelName: namespace
+            matchers:
+              - up{service=~".*vllm.*"}
+
+    - kind: ListVariable
+      spec:
+        name: SVC
+        display: { name: Service }
+        allowMultiple: false
+        defaultValue: vllm-qwen2-0-5b-sim
+        plugin:
+          kind: PrometheusLabelValuesVariable
+          spec:
+            datasource: { kind: PrometheusDatasource, name: accelerators-thanos-querier-datasource }
+            labelName: service
+            matchers:
+              - up{namespace="$NS",service=~".*vllm.*"}
+
+    - kind: ListVariable
+      spec:
+        name: MODEL
+        display: { name: Model (real vLLM) }
+        allowAllValue: true
+        allowMultiple: true
+        defaultValue: ["$__all"]
+        plugin:
+          kind: PrometheusLabelValuesVariable
+          spec:
+            datasource: { kind: PrometheusDatasource, name: accelerators-thanos-querier-datasource }
+            labelName: model_name
+            matchers:
+              - vllm:request_success_total{namespace="$NS",service="$SVC"}
+
+  panels:
+
+    # --- Core (works on Simulator & Real) ---
+    core_running_now:
+      kind: Panel
+      spec:
+        display: { name: Running Requests (now) }
+        plugin: { kind: StatChart, spec: { calculation: last-number } }
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource: { kind: PrometheusDatasource, name: accelerators-thanos-querier-datasource }
+                  query: sum(vllm:num_requests_running{namespace="$NS",service="$SVC"}) or vector(0)
+                  minStep: "15s"
+
+    core_waiting_now:
+      kind: Panel
+      spec:
+        display: { name: Waiting Requests (now) }
+        plugin: { kind: StatChart, spec: { calculation: last-number } }
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource: { kind: PrometheusDatasource, name: accelerators-thanos-querier-datasource }
+                  query: sum(vllm:num_requests_waiting{namespace="$NS",service="$SVC"}) or vector(0)
+                  minStep: "15s"
+
+    core_kv_usage_now:
+      kind: Panel
+      spec:
+        display: { name: KV Cache Usage (0–1) }
+        plugin: { kind: StatChart, spec: { calculation: last-number } }
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource: { kind: PrometheusDatasource, name: accelerators-thanos-querier-datasource }
+                  query: avg(vllm:gpu_cache_usage_perc{namespace="$NS",service="$SVC"}) or vector(0)
+                  minStep: "15s"
+
+    core_running_ts:
+      kind: Panel
+      spec:
+        display: { name: Running Over Time }
+        plugin:
+          kind: TimeSeriesChart
+          spec:
+            legend: { mode: table, position: bottom }
+            visual: { display: line, lineWidth: 1, areaOpacity: 0.3 }
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource: { kind: PrometheusDatasource, name: accelerators-thanos-querier-datasource }
+                  query: sum by (service) (vllm:num_requests_running{namespace="$NS",service="$SVC"}) or vector(0)
+                  minStep: "15s"
+
+    core_waiting_ts:
+      kind: Panel
+      spec:
+        display: { name: Waiting Over Time }
+        plugin:
+          kind: TimeSeriesChart
+          spec:
+            legend: { mode: table, position: bottom }
+            visual: { display: line, lineWidth: 1, areaOpacity: 0.3 }
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource: { kind: PrometheusDatasource, name: accelerators-thanos-querier-datasource }
+                  query: sum by (service) (vllm:num_requests_waiting{namespace="$NS",service="$SVC"}) or vector(0)
+                  minStep: "15s"
+
+    core_targets_up:
+      kind: Panel
+      spec:
+        display: { name: Scrape Targets Up }
+        plugin: { kind: StatChart, spec: { calculation: last-number } }
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource: { kind: PrometheusDatasource, name: accelerators-thanos-querier-datasource }
+                  query: count(up{namespace="$NS",service="$SVC"} == 1) or vector(0)
+                  minStep: "15s"
+
+    # --- KV Cache as Percent (works on Simulator & Real) ---
+    core_kv_usage_pct_now:
+      kind: Panel
+      spec:
+        display: { name: KV Cache Usage (%) – now }
+        plugin: { kind: StatChart, spec: { calculation: last-number } }
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource: { kind: PrometheusDatasource, name: accelerators-thanos-querier-datasource }
+                  # multiply by 100 to present percentage; omit format.unit to avoid schema conflicts
+                  query: (avg(vllm:gpu_cache_usage_perc{namespace="$NS",service="$SVC"}) * 100) or vector(0)
+                  minStep: "15s"
+
+    core_kv_usage_pct_ts:
+      kind: Panel
+      spec:
+        display: { name: KV Cache Usage (%) – over time }
+        plugin:
+          kind: TimeSeriesChart
+          spec:
+            legend: { mode: table, position: bottom }
+            visual: { display: line, lineWidth: 1, areaOpacity: 0.3 }
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource: { kind: PrometheusDatasource, name: accelerators-thanos-querier-datasource }
+                  query: (avg by (service) (vllm:gpu_cache_usage_perc{namespace="$NS",service="$SVC"}) * 100) or vector(0)
+                  minStep: "15s"
+
+    # --- Per-Pod breakdowns (works on Simulator & Real) ---
+    per_pod_running_ts:
+      kind: Panel
+      spec:
+        display: { name: Running by Pod }
+        plugin:
+          kind: TimeSeriesChart
+          spec:
+            legend: { mode: table, position: bottom }
+            visual: { display: line, lineWidth: 1, areaOpacity: 0.3 }
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource: { kind: PrometheusDatasource, name: accelerators-thanos-querier-datasource }
+                  query: sum by (pod) (vllm:num_requests_running{namespace="$NS",service="$SVC"}) or vector(0)
+                  minStep: "15s"
+
+    per_pod_waiting_ts:
+      kind: Panel
+      spec:
+        display: { name: Waiting by Pod }
+        plugin:
+          kind: TimeSeriesChart
+          spec:
+            legend: { mode: table, position: bottom }
+            visual: { display: line, lineWidth: 1, areaOpacity: 0.3 }
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource: { kind: PrometheusDatasource, name: accelerators-thanos-querier-datasource }
+                  query: sum by (pod) (vllm:num_requests_waiting{namespace="$NS",service="$SVC"}) or vector(0)
+                  minStep: "15s"
+
+    per_pod_kv_pct_ts:
+      kind: Panel
+      spec:
+        display: { name: KV Cache (%) by Pod }
+        plugin:
+          kind: TimeSeriesChart
+          spec:
+            legend: { mode: table, position: bottom }
+            visual: { display: line, lineWidth: 1, areaOpacity: 0.3 }
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource: { kind: PrometheusDatasource, name: accelerators-thanos-querier-datasource }
+                  # if your exporter labels kv metric with pod (the sim does), this works; otherwise it will just return empty
+                  query: (avg by (pod) (vllm:gpu_cache_usage_perc{namespace="$NS",service="$SVC"}) * 100) or vector(0)
+                  minStep: "15s"
+
+    # --- Real vLLM only (zeros on simulator) ---
+    real_req_rate_ts:
+      kind: Panel
+      spec:
+        display: { name: Request Rate (real vLLM) }
+        plugin:
+          kind: TimeSeriesChart
+          spec:
+            legend: { mode: table, position: bottom }
+            visual: { display: line, lineWidth: 1, areaOpacity: 0.3 }
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource: { kind: PrometheusDatasource, name: accelerators-thanos-querier-datasource }
+                  query: sum by (model_name) (rate(vllm:request_success_total{namespace="$NS",service="$SVC",model_name=~"$MODEL"}[$__interval])) or vector(0)
+                  minStep: "15s"
+
+    real_p50:
+      kind: Panel
+      spec:
+        display: { name: p50 Latency (real vLLM) }
+        plugin: { kind: StatChart, spec: { calculation: last-number } }
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource: { kind: PrometheusDatasource, name: accelerators-thanos-querier-datasource }
+                  query: histogram_quantile(0.50, sum by (le, model_name) (rate(vllm:e2e_request_latency_seconds_bucket{namespace="$NS",service="$SVC",model_name=~"$MODEL"}[$__interval]))) or vector(0)
+                  minStep: "15s"
+
+    real_p90:
+      kind: Panel
+      spec:
+        display: { name: p90 Latency (real vLLM) }
+        plugin: { kind: StatChart, spec: { calculation: last-number } }
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource: { kind: PrometheusDatasource, name: accelerators-thanos-querier-datasource }
+                  query: histogram_quantile(0.90, sum by (le, model_name) (rate(vllm:e2e_request_latency_seconds_bucket{namespace="$NS",service="$SVC",model_name=~"$MODEL"}[$__interval]))) or vector(0)
+                  minStep: "15s"
+
+    real_p99:
+      kind: Panel
+      spec:
+        display: { name: p99 Latency (real vLLM) }
+        plugin: { kind: StatChart, spec: { calculation: last-number } }
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource: { kind: PrometheusDatasource, name: accelerators-thanos-querier-datasource }
+                  query: histogram_quantile(0.99, sum by (le, model_name) (rate(vllm:e2e_request_latency_seconds_bucket{namespace="$NS",service="$SVC",model_name=~"$MODEL"}[$__interval]))) or vector(0)
+                  minStep: "15s"
+
+    real_input_tokens_ts:
+      kind: Panel
+      spec:
+        display: { name: Input Tokens / sec (real vLLM) }
+        plugin:
+          kind: TimeSeriesChart
+          spec:
+            legend: { mode: table, position: bottom }
+            visual: { display: line, lineWidth: 1, areaOpacity: 0.3 }
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource: { kind: PrometheusDatasource, name: accelerators-thanos-querier-datasource }
+                  query: sum by (model_name) (rate(vllm:prompt_tokens_total{namespace="$NS",service="$SVC",model_name=~"$MODEL"}[$__interval])) or vector(0)
+                  minStep: "15s"
+
+    real_output_tokens_ts:
+      kind: Panel
+      spec:
+        display: { name: Output Tokens / sec (real vLLM) }
+        plugin:
+          kind: TimeSeriesChart
+          spec:
+            legend: { mode: table, position: bottom }
+            visual: { display: line, lineWidth: 1, areaOpacity: 0.3 }
+        queries:
+          - kind: TimeSeriesQuery
+            spec:
+              plugin:
+                kind: PrometheusTimeSeriesQuery
+                spec:
+                  datasource: { kind: PrometheusDatasource, name: accelerators-thanos-querier-datasource }
+                  query: sum by (model_name) (rate(vllm:generation_tokens_total{namespace="$NS",service="$SVC",model_name=~"$MODEL"}[$__interval])) or vector(0)
+                  minStep: "15s"
+
+  layouts:
+    - kind: Grid
+      spec:
+        display: { title: Core (Sim & Real) }
+        items:
+          - { x: 0,  y: 0,  width: 6,  height: 3, content: { $ref: '#/spec/panels/core_running_now' } }
+          - { x: 6,  y: 0,  width: 6,  height: 3, content: { $ref: '#/spec/panels/core_waiting_now' } }
+          - { x: 12, y: 0,  width: 6,  height: 3, content: { $ref: '#/spec/panels/core_kv_usage_now' } }
+          - { x: 18, y: 0,  width: 6,  height: 3, content: { $ref: '#/spec/panels/core_targets_up' } }
+          - { x: 0,  y: 3,  width: 12, height: 6, content: { $ref: '#/spec/panels/core_running_ts' } }
+          - { x: 12, y: 3,  width: 12, height: 6, content: { $ref: '#/spec/panels/core_waiting_ts' } }
+
+    - kind: Grid
+      spec:
+        display: { title: KV Cache (%) }
+        items:
+          - { x: 0,  y: 9,  width: 6,  height: 3, content: { $ref: '#/spec/panels/core_kv_usage_pct_now' } }
+          - { x: 6,  y: 9,  width: 18, height: 6, content: { $ref: '#/spec/panels/core_kv_usage_pct_ts' } }
+
+    - kind: Grid
+      spec:
+        display: { title: Per-Pod breakdowns }
+        items:
+          - { x: 0,  y: 15, width: 12, height: 6, content: { $ref: '#/spec/panels/per_pod_running_ts' } }
+          - { x: 12, y: 15, width: 12, height: 6, content: { $ref: '#/spec/panels/per_pod_waiting_ts' } }
+          - { x: 0,  y: 21, width: 24, height: 6, content: { $ref: '#/spec/panels/per_pod_kv_pct_ts' } }
+
+    - kind: Grid
+      spec:
+        display: { title: Real vLLM only (shows 0 on simulator) }
+        items:
+          - { x: 0,  y: 27, width: 12, height: 6, content: { $ref: '#/spec/panels/real_req_rate_ts' } }
+          - { x: 12, y: 27, width: 4,  height: 3, content: { $ref: '#/spec/panels/real_p50' } }
+          - { x: 16, y: 27, width: 4,  height: 3, content: { $ref: '#/spec/panels/real_p90' } }
+          - { x: 20, y: 27, width: 4,  height: 3, content: { $ref: '#/spec/panels/real_p99' } }
+          - { x: 0,  y: 33, width: 12, height: 6, content: { $ref: '#/spec/panels/real_input_tokens_ts' } }
+          - { x: 12, y: 33, width: 12, height: 6, content: { $ref: '#/spec/panels/real_output_tokens_ts' } }
+