elasticsearch.cluster.shards
Shard distribution statusSummary
Reports the total number of shards across the entire Elasticsearch cluster, including both primary and replica shards. This fundamental cluster health metric directly impacts resource consumption and cluster stability. Excessive shard counts (particularly beyond 20-30 shards per GB of heap as documented in the "Excessive Shard Count Degrades Performance" insight) cause master node overhead, slower cluster state updates, and degraded performance requiring shard consolidation or reindexing strategies.
Interface Metrics (13)
Technical Annotations (2)
Technical References (2)
shard allocationcomponentDecision.TypecomponentRelated Insights (9)
Missing or unassigned shards indicate data unavailability and can degrade cluster performance. Red cluster status means primary shards are unallocated, causing data loss risk.
Too many shards consume cluster resources even when idle, causing slow queries and increased overhead. Rule of thumb: keep shards below 20 per GB of heap configured.
When any node crosses the low disk watermark (85% full by default), Elasticsearch starts relocating shards. Multiple nodes hitting watermarks simultaneously can trigger cascading relocations that overload cluster I/O and delay recovery.
When primary shards cannot be assigned (elasticsearch.cluster.health == 2), data becomes unavailable and cluster enters red state. This occurs from insufficient nodes, misconfigured shard allocation rules, or node failures during insufficient replica coverage.
Too many shards consume cluster resources even when idle, causing slow queries, increased overhead, and reduced stability. Rule of thumb: keep shards below 20 per GB of heap configured.
Failed snapshot operations risk data loss and violate backup SLAs. SLM policy failures can result from repository unavailability, insufficient permissions, or storage quota issues.
Improper distribution of shards or unbalanced node roles can cause resource hotspots where some nodes are overloaded while others are underutilized.
Slow shard recovery after node failures or restarts delays cluster stabilization and can indicate network issues, disk I/O bottlenecks, or configuration problems.