DataHub insights
Open SourceVersions: [0.14]30 metricsDataHub backend API experiencing elevated error rates impacting metadata ingestion, UI operations, and external integrations, potentially indicating service degradation or infrastructure issues.
DataHub GMS or Frontend experiencing memory pressure causing frequent garbage collection pauses, degrading API response times and potentially leading to OutOfMemoryErrors and service unavailability.
DataHub ingestion jobs failing silently or with warnings, causing metadata gaps that prevent data quality monitoring, lineage tracking, and incident detection from functioning properly.
DataHub metadata search results and lineage views showing stale information because Elasticsearch indices are not being updated timely, impacting data discovery and incident response.
DataHub ingestion falling behind due to Kafka consumer lag, causing metadata changes and quality checks to be delayed, leading to stale lineage and undetected data quality issues.
Organizations experiencing data quality incidents (bad data reaching dashboards, ML models) that DataHub's observability should catch, but failures occur because assertions are not configured or monitored on critical datasets.
Prolonged JVM garbage collection pauses in DataHub consumers cause Kafka consumer lag to spike. This creates a cascading failure where metadata ingestion stalls, leading to stale catalog data and failed data quality checks.
DataHub ingestion pipelines stall when sink write rate (sink_workunits_write) lags significantly behind source production rate (source_workunits_produced). This indicates downstream storage systems (Elasticsearch, database) cannot keep pace with ingestion volume.
DataHub's asynchronous write architecture can hide processing failures. High Kafka consumer lag combined with ingestion warnings/failures indicates metadata events are queued but not successfully persisting to primary or search storage.