Kafka Consumer Lag Ingestion Backlog
criticalDataHub ingestion falling behind due to Kafka consumer lag, causing metadata changes and quality checks to be delayed, leading to stale lineage and undetected data quality issues.
Monitor kafka_consumer_lag for DataHub's MCE (MetadataChangeEvent), MAE (MetadataAuditEvent), MCP (MetadataChangeProposal_v1), and MCL (MetadataChangeLog_v1) consumers. Alert when lag exceeds thresholds (e.g., >1000 messages for 5+ minutes) indicating metadata updates are not being processed in real-time.
Scale DataHub MCE/MAE consumer pods horizontally or increase consumer throughput. Check messaging_process_time to identify if processing is slow due to downstream bottlenecks (Elasticsearch indexing, database writes). Verify Kafka broker health and topic partition distribution. Review recent metadata volume spikes using kafka_message_queue_time to determine if backlog is temporary or sustained.