Snapshot Lifecycle Policy Failures
criticalFailed snapshot operations risk data loss and violate backup SLAs. SLM policy failures can result from repository unavailability, insufficient permissions, or storage quota issues.
elasticsearch.slm or elasticsearch.snapshot showing failed operations, or snapshot operations not completing within expected timeframes
Investigate via _slm/policy and _snapshot repository status APIs. Check: (1) Repository accessibility - verify network connectivity and credentials for S3/Azure/GCS, (2) Storage quota - ensure sufficient space in snapshot repository, (3) Cluster state - red/yellow status blocks snapshots, resolve via elasticsearch.cluster.health, (4) Snapshot conflicts - only one snapshot per repository allowed concurrently. Review SLM policy configuration for retention and scheduling conflicts. Monitor snapshot duration trends - increasing times indicate growing data volume or storage performance issues. Verify snapshot restore capability periodically via test restores to different cluster.