DataHubApache Kafka

JVM GC Pause Cascading to Kafka Consumer Lag

warning
reliabilityUpdated Oct 8, 2025

Prolonged JVM garbage collection pauses in DataHub consumers cause Kafka consumer lag to spike. This creates a cascading failure where metadata ingestion stalls, leading to stale catalog data and failed data quality checks.

How to detect:

Monitor jvm_gc_pause duration and frequency. When GC pauses exceed 1s, check kafka_consumer_lag for corresponding spikes. Correlate with jvm_memory_used approaching heap limits and messaging_process_time latency increases.

Recommended action:

Tune JVM heap settings (JAVA_OPTS: '-Xms4g -Xmx6g -XX:+UseG1GC'). Ensure heap is 50% of container memory. Scale consumer replicas to distribute load. Enable ENTITY_SERVICE_ENABLE_CACHE to reduce memory pressure. Consider increasing container memory limits if heap pressure persists.