Translog Accumulation Risk

warning

reliabilityUpdated Mar 2, 2026

Transaction log (translog) accumulates uncommitted operations between flushes. Excessive translog size increases recovery time after node failures and can indicate flush problems or configuration issues.

Technologies:

Elasticsearchsubject

elasticsearch.node.translog.uncommitted.size

elasticsearch.node.translog.operations

elasticsearch.index.translog.operations

elasticsearch.node.operations.completed

How to detect:

elasticsearch.node.translog.uncommitted.size growing significantly (>1GB per shard) or elasticsearch.node.translog.operations count very high without corresponding flush operations

Recommended action:

Check flush operation frequency via _nodes/stats API. Default flush triggers at 512MB translog size or 30-minute interval (index.translog.flush_threshold_size and index.translog.sync_interval). If translog growing beyond threshold: (1) Verify flush operations completing successfully via logs, (2) Check disk I/O capacity - slow disk prevents timely flushes, (3) Review index.translog.durability setting (request vs async) - async improves performance but risks data loss on crash. For large bulk loads, consider temporarily increasing flush_threshold_size, then reset after completion. Monitor recovery time after node restart - long recovery correlates with large translog size.