Technologies/Apache Kafka/kafka.consumer.records_lag_max
Apache KafkaApache KafkaMetric

kafka.consumer.records_lag_max

Maximum consumer lag
Dimensions:None
Available on:DatadogDatadog (1)DynatraceDynatrace (1)
Interface Metrics (2)
DatadogDatadog
Maximum consumer lag.
Dimensions:None
DynatraceDynatrace
The max lag of the partition
Dimensions:None
Related Insights (13)
Kafka Consumer Lag Accumulation Under Loadcritical

When consumer processing rate falls below production rate, lag grows continuously until retention limits are reached, risking data loss. Lag trends reveal impending failures before customers notice.

Hot Partition Creates Uneven Consumer Lagwarning

When one partition consistently shows higher lag than others, it indicates uneven key distribution or specific message types requiring more processing time, creating a processing bottleneck.

Lambda Timeout Increases Kafka Offset Lagwarning

When Lambda function timeout is increased without considering batch processing dynamics, functions may process fewer batches per unit time, paradoxically increasing overall lag despite having more time per invocation.

Lambda ESM Provisioned Mode Cost Waste from Over-Provisioninginfo

When minimum event pollers are set too high relative to actual throughput requirements, Lambda ESM provisioned mode incurs unnecessary costs. Each event poller handles up to 5 MB/sec or 5 concurrent invocations for Kafka.

Kafka Consumer Lag Cascading to Lambda Throttlingcritical

When Lambda functions consuming from Kafka (MSK or self-managed) experience throttling due to concurrency limits, Kafka offset lag increases, creating a feedback loop where backed-up messages cause further Lambda invocations that hit throttle limits.

Kafka Event Poller Autoscaling Lag Indicatorwarning

Lambda's on-demand Kafka event pollers scale based on offset lag evaluation every minute, but the autoscaling process takes up to three minutes to complete. High offset lag combined with low event poller counts indicates insufficient polling capacity before autoscaling can respond.

Lambda Timeout Extension Masking Kafka Processing Issueswarning

When Lambda timeout is increased from default to maximum (15 minutes) for Kafka event source mappings, execution duration increases but offset lag continues to grow, indicating the timeout increase is masking an underlying processing bottleneck rather than solving throughput issues.

Kafka Partition Imbalance in Lambda Event Processingcritical

Lambda limits MaximumPollers to the number of Kafka topic partitions to maintain ordered processing within partitions. When a topic has few partitions relative to message volume, Lambda cannot scale event pollers sufficiently, creating a throughput ceiling regardless of provisioned capacity.

Lambda AsyncEventAge as Kafka Dead Letter Queue Predictorwarning

For Lambda functions with asynchronous invocation patterns consuming Kafka events, AsyncEventAge increasing alongside Kafka offset lag indicates events are queuing in Lambda's internal queue before invocation, creating double-buffering that delays processing and increases the risk of event loss if retries exhaust.

Kafka Broker Under-Replication Impacting Lambda Consumer Lagcritical

When Kafka under-replicated partitions increase (kafka.replication.under_replicated_partitions) or ISR shrinks (kafka.server.ReplicaManager.IsrExpandsPerSec drops), Lambda event source mappings may experience increased fetch latency and offset lag as the cluster struggles to maintain replication consistency.

Kafka Consumer Group Rebalance Storm Triggering Lambda Restartswarning

Frequent Kafka consumer group rebalances (detected via kafka_consumergroup_members changes) can trigger Lambda function restarts (fullRestarts metric), causing processing interruptions, increased cold starts (InitDuration), and temporary offset lag spikes as Lambda event source mappings rejoin the consumer group.

Topic Retention Approaching With Insufficient Consumer Throughputcritical

When oldest message age approaches retention limit and consumer lag is high, messages will be deleted before consumption, causing data loss.

Consumer Group Member Instability from Frequent Rebalanceswarning

Frequent changes in consumer group membership trigger rebalances, causing processing pauses, increased latency, and temporary unavailability.