Technologies/Apache Flink/kafka.consumer_group.lag
Apache FlinkApache FlinkMetric

kafka.consumer_group.lag

Consumer group lag for a topic partition
Dimensions:None

Technical Annotations (63)

Configuration Parameters (13)
fetch.min.bytes
consumer setting controlling minimum data size per fetch request
max.poll.records
consumer setting controlling records returned per poll for processing
enable.auto.commitrecommended: false
Prevents offset commits before message processing completes during rebalancing
max.poll.interval.msrecommended: 900000
Upper bound for worst-case processing logic including GC pauses and retries
session.timeout.msrecommended: 45000
Detects heartbeat thread death (JVM crash), should remain relatively low
heartbeat.interval.msrecommended: 15000
Must be strictly 1/3 of session.timeout.ms to allow for missed heartbeats
group.instance.idrecommended: change to unique value or remove
Persistent ID causes Coordinator to reserve partitions for failing consumer
partition.assignment.strategyrecommended: CooperativeSticky
enables incremental cooperative rebalancing to reduce disruption
follower.replication.throttled.raterecommended: > 1MB/sec
minimum for accurate throttling behavior during recovery
leader.replication.throttled.raterecommended: > 1MB/sec
minimum for accurate throttling behavior during recovery
delete.retention.msrecommended: 86400000
Default 1 day retention for tombstone markers on compacted topics
cleanup.policyrecommended: compact
Setting applies only to compacted topics
retention.msrecommended: 259200000
3 days (example for windowed processing with late arrivals and recovery)
Error Signatures (3)
Schema Mismatchexception
poll timeout expiredlog pattern
left grouplog pattern
CLI Commands (5)
kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group my-consumer-groupdiagnostic
consumer.subscribe(['topic'], on_assign=on_assign, on_revoke=on_revoke)remediation
producer.produce(topic, key=str(user_id), value=value, partition=hash(user_id) % 4)remediation
admin_client.list_consumer_group_offsets('my_group')diagnostic
kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --all-groupsmonitoring
Technical References (42)
EventLoopcomponentRSSconceptcollectDefaultMetricscomponentconsumer lagconceptoffset commitsconceptindexing latencyconceptbulk requestsconceptElasticsearchcomponentCooperativeStickyAssignorcomponentKIP-62conceptApplication ThreadcomponentHeartbeat ThreadcomponentGroup CoordinatorcomponentStatic Membershipconceptgroup.instance.idcomponentZombie PartitionconceptLatent Partition StarvationconceptCooperative Rebalancingprotocolconsumer groupcomponentISR churnconceptpage cache hit ratioconceptrebalanceconceptpartitionconceptpartition assignmentconceptoffsetconceptISRconceptConfluent Control Centercomponentpartition key hashingconceptcustom partitionercomponentKafkaAdminClientcomponentJMX metricsconceptfollower.replication.throttled.replicascomponentleader.replication.throttled.replicascomponentlog compactionconcepttombstone markersconceptassign()componentsubscribe()componentconsumer group coordinatorconceptSingle Message TransformscomponentSMTcomponentKafka StreamscomponentksqlDBcomponent
Related Insights (32)
EventLoop lag blocks Kafka consumer causing consumer group lagcritical
Service restart temporarily masks memory leak by resetting Kafka consumer lagwarning
Suboptimal consumer fetch settings increase latencyinfo
Poison pill message causes consumer infinite retry loop on single partitioncritical
Aggregated metrics hide single-partition consumer failureswarning
Consumer group Stable state misleading when partition stuckwarning
Partition shows continuous writes but zero reads indicates stuck consumercritical
Consumer lag prevents real-time message processingwarning
Consumer offset stops advancing indicating stuck consumercritical
Consumer lag increases steadily due to slow processingwarning
Downstream Elasticsearch saturation manifests as Kafka consumer lag spikecritical
Consumer lag alerts without downstream latency monitoring cause misdiagnosiswarning
Auto-commit enabled causes silent data loss during consumer rebalancingcritical
Rebalance spiral livelock prevents partition processing when max.poll.interval.ms exceededcritical
Static membership holds partitions hostage when consumer fails repeatedlycritical
Cooperative rebalancing masks partition starvation in aggregate metricswarning
Consumer lag escalates rapidly while broker health metrics remain normalcritical
Consumer poll timeout causes rebalance loopswarning
Consumer lag spike without root cause visibilitywarning
Gradual lag accumulation goes undetectedwarning
Frequent consumer rebalances cause consumer lagwarning
Missing monitoring of ISR shrinks and consumer lag delays incident responsewarning
Consumer group misconfiguration causes lag and duplicationwarning
Poor partition key selection creates hot partitionswarning
Silent failures occur without monitoring and alertingwarning
High or growing consumer lag indicates processing bottleneckwarning
Insufficient replication throttle causes cluster overload during recoverywarning
Consumer reads from compacted topics fail when scan time exceeds tombstone retentionwarning
Consumer lag exceeds retention window causing data losswarning
Consumer lag metrics missing for consumer groupswarning
SMTs introduce latency under heavy traffic with complex enrichmentwarning
Sink connector tasks accumulating lagwarning