Grafana

Grafana Dashboard Latency Spike Despite Healthy Infrastructure

warning
latencyUpdated Feb 6, 2026

Grafana dashboards exhibit slow performance (high read latency, sluggish UI) even when infrastructure metrics (CPU, memory) appear healthy. This occurs when background compaction tasks, query overload, or excessive time-series rendering saturate resources invisibly.

How to detect:

Monitor P99 read latency (grafana_api_dashboard_get_milliseconds_datadog, grafana_page_response_status) alongside pending compaction tasks (if using underlying database like Cassandra) and in-flight HTTP requests (grafana_http_request_in_flight). A spike in P99 latency while P50 remains stable, combined with rising in-flight requests or compaction backlog, indicates hidden resource contention.

Recommended action:

Reduce dashboard complexity by limiting max data points returned per query, increasing minimum query intervals, and using query result caching. Audit dashboards for excessive time-series rendering (dozens/hundreds of series) and apply aggregation functions (e.g., Graphite's highestMax) to reduce cardinality. If Grafana uses a backend like Cassandra, investigate pending compaction tasks.