DataHub

Entity Cache Miss Storm on Cold Start

warning
latencyUpdated Oct 8, 2025

DataHub experiences severe latency spikes immediately after pod restarts when entity cache is cold. Every GraphQL query hits the database directly, causing connection pool exhaustion and cascading timeouts.

How to detect:

Monitor http.server.request.duration spikes within 5 minutes of pod restart (correlate with process start time). Low CACHE_ENTITY_CACHE_SIZE hit rate combined with high database query latency indicates cold cache. Check jvm_threads_live for thread pool saturation.

Recommended action:

Implement cache warming on startup by preloading frequently-accessed entities. Increase CACHE_ENTITY_CACHE_SIZE from default 10,000 to 50,000+ for large catalogs. Enable ENTITY_SERVICE_ENABLE_CACHE and RELATIONSHIP_SERVICE_ENABLE_CACHE. Use rolling deployments with readiness probes that wait for cache warming before accepting traffic.