kafka.broker.config.log_retention_ms
Log retention milliseconds configInterface Metrics (1)
About this metric
The kafka.broker.config.log_retention_ms metric represents the configured time-based retention period for Kafka log segments at the broker level, measured in milliseconds. This configuration parameter determines how long Kafka will retain messages in a topic before they become eligible for deletion, provided no other retention constraints (such as size-based retention) are reached first. As documented in the Apache Kafka documentation on log retention, this setting can be specified at the broker level as a default and overridden per topic. The metric exposes the current effective configuration value as a gauge, allowing operators to monitor and verify retention policies are set as intended across their Kafka infrastructure. Understanding this configuration is operationally significant because it directly impacts disk space utilization, data availability windows, and compliance requirements for data retention.
From an operational perspective, monitoring this metric helps organizations maintain appropriate balance between storage costs and data availability requirements. Longer retention periods provide extended replay capabilities for consumer applications and facilitate debugging of historical issues, but they also increase storage costs and disk I/O overhead during log compaction and cleanup operations. This metric is particularly valuable for cost management in multi-tenant Kafka clusters where different topics may require different retention policies based on business requirements. According to Kafka's design principles, the platform is built to handle large data retention efficiently, but operators must still carefully tune retention settings to match infrastructure capacity and business needs.
Healthy patterns for this metric involve consistency across broker configurations in a cluster (unless intentional variation exists) and alignment with documented data retention policies. Typical retention values range from 604,800,000 milliseconds (7 days) for high-throughput operational data to 2,592,000,000 milliseconds (30 days) or longer for audit logs and compliance-critical data. Common alerting use cases include detecting unexpected changes to retention configuration that could indicate configuration drift or unauthorized modifications, and monitoring for retention values that are approaching disk capacity limits when correlated with actual log size metrics. Troubleshooting scenarios often involve investigating why data is being deleted sooner than expected (potentially due to broker-level defaults overriding topic-specific settings) or why disk utilization is growing unexpectedly (possibly due to overly long retention periods).