Technologies/Elasticsearch/elasticsearch.cluster.shards

ElasticsearchMetric

elasticsearch.cluster.shards

Shard distribution status

Dimensions:None

Available on:

Summary

Reports the total number of shards across the entire Elasticsearch cluster, including both primary and replica shards. This fundamental cluster health metric directly impacts resource consumption and cluster stability. Excessive shard counts (particularly beyond 20-30 shards per GB of heap as documented in the "Excessive Shard Count Degrades Performance" insight) cause master node overhead, slower cluster state updates, and degraded performance requiring shard consolidation or reindexing strategies.

Interface Metrics (13)

Datadog

elasticsearch.relocating_shards

The number of shards that are relocating from one node to another.

Dimensions:None

Prometheus

relocating_shards

The number of shards that are currently moving from one node to another node.

Dimensions:None

Prometheus

active_primary_shards

The number of primary shards in your cluster. This is an aggregate total across all indices.

Dimensions:None

Datadog

elasticsearch.initializing_shards

The number of shards that are currently initializing.

Dimensions:None

Prometheus

initializing_shards

Count of shards that are being freshly created.

Dimensions:None

OpenTelemetry

elasticsearch.cluster.shards

The number of shards in the cluster.

Dimensions:None

Dynatrace

elasticsearch_cluster_health_relocating_shards

Number of relocating shards

Dimensions:None

Datadog

elasticsearch.unassigned_shards

The number of shards that are unassigned to a node.

Dimensions:None

Dynatrace

elasticsearch_cluster_health_delayed_unassigned_shards

Number of delayed unassigned shards

Dimensions:None

Dynatrace

elasticsearch_cluster_health_initializing_shards

Number of initializing shards

Dimensions:None

Datadog

elasticsearch.active_primary_shards

The number of active primary shards in the cluster.

Dimensions:None

Dynatrace

elasticsearch_cluster_health_unassigned_shards

Number of unassigned shards

Dimensions:None

Datadog

elasticsearch.delayed_unassigned_shards

The number of shards whose allocation has been delayed [v2.4+].

Dimensions:None

Sources

elasticsearch.relocating_shardsgithub.com

relocating_shardsgithub.com

elasticsearch.initializing_shardsgithub.com

elasticsearch.cluster.shardsgithub.com

elasticsearch_cluster_health_relocating_shardswww.dynatrace.com

elasticsearch.unassigned_shardsgithub.com

elasticsearch.active_primary_shardsgithub.com

elasticsearch.delayed_unassigned_shardsgithub.com

Technical Annotations (2)

Technical References (2)

shard allocationcomponentDecision.Typecomponent

Related Insights (9)

Unassigned Shards Blocking Recoverycritical

Missing or unassigned shards indicate data unavailability and can degrade cluster performance. Red cluster status means primary shards are unallocated, causing data loss risk.

▸

Excessive Shard Count Degrades Performancecritical

Too many shards consume cluster resources even when idle, causing slow queries and increased overhead. Rule of thumb: keep shards below 20 per GB of heap configured.

▸

Disk Watermark Shard Relocation Stormwarning

When any node crosses the low disk watermark (85% full by default), Elasticsearch starts relocating shards. Multiple nodes hitting watermarks simultaneously can trigger cascading relocations that overload cluster I/O and delay recovery.

▸

Unassigned Shard Red Cluster Spiralcritical

When primary shards cannot be assigned (elasticsearch.cluster.health == 2), data becomes unavailable and cluster enters red state. This occurs from insufficient nodes, misconfigured shard allocation rules, or node failures during insufficient replica coverage.

▸

Excessive Shard Count Degrading Performancecritical

Too many shards consume cluster resources even when idle, causing slow queries, increased overhead, and reduced stability. Rule of thumb: keep shards below 20 per GB of heap configured.

▸

Snapshot Lifecycle Policy Failurescritical

Failed snapshot operations risk data loss and violate backup SLAs. SLM policy failures can result from repository unavailability, insufficient permissions, or storage quota issues.

▸

Node Role Imbalance Causing Hotspotswarning

Improper distribution of shards or unbalanced node roles can cause resource hotspots where some nodes are overloaded while others are underutilized.

▸

Index Recovery Prolonged Durationwarning

Slow shard recovery after node failures or restarts delays cluster stabilization and can indicate network issues, disk I/O bottlenecks, or configuration problems.

▸

Allocation decision serialization BWC issue may affect mixed-version clusterswarning

▸