CoreDNS running on shared CPU instances experiences intermittent slowness due to CPU steal, particularly during neighbor workload spikes, causing unpredictable DNS latency that's difficult to attribute.
High cilium_fqdn_active_names or cilium_fqdn_active_ips counts indicate DNS-based policy consuming significant memory. When combined with low cilium_fqdn_gc_deletions_datadog, stale entries accumulate, causing endpoint regeneration delays and potential OOM conditions.
Low cache hit rates indicate CoreDNS is frequently querying upstream servers, increasing latency and resource consumption. This often manifests as high query response times and elevated CPU usage.
Insufficient CPU or memory allocation causes CoreDNS to become unresponsive or crash under high query loads, manifesting as OOMKilled events and DNS resolution timeouts.
High concurrent query volume exceeds forward plugin's max_concurrent setting, causing queries to queue and response times to spike, particularly impacting applications with aggressive retry logic.
Poorly configured prefetch settings fail to proactively refresh popular records before expiry, causing cache misses during high-traffic periods and increasing latency for frequently accessed services.
CoreDNS cannot reach upstream DNS servers, causing external domain resolution failures while internal cluster DNS continues to work. This indicates network connectivity or upstream DNS server issues.
High ndots setting (default 5) causes excessive DNS queries as each name is tried with all search domains before FQDN lookup, leading to DNS cache stampede and intermittent failures during pod scaling events.
Multiple CoreDNS pods scheduled on the same node create single points of failure and uneven query load distribution, reducing reliability and causing localized performance issues.
Overly restrictive NetworkPolicies prevent pods from reaching CoreDNS service, causing 'connection refused' or timeout errors that appear as application-level DNS failures rather than network issues.
Kong resolves upstream hostnames for every request by default, adding 20-100ms latency per request. This DNS lookup overhead becomes a severe bottleneck at high request rates.
AWS ENI hard limit of 1024 packets per second causes DNS throttling when exceeded, leading to intermittent resolution failures that are difficult to diagnose without monitoring ENI-level metrics.