Cilium

Endpoint Connectivity Timeout Without Health Daemon

critical
connectivityUpdated Mar 27, 2024

When endpoints on remote nodes fail to respond to ICMP/HTTP health probes, but host-level connectivity succeeds, this indicates datapath or policy issues preventing traffic from reaching the endpoint namespace. This pattern isolates the failure to pod networking rather than node-to-node connectivity.

How to detect:

Monitor cilium-health status output. If host connectivity shows 'OK' but endpoint connectivity shows 'Connection timed out' for remote nodes, the issue is between the host network and pod namespace. This can be detected by parsing cilium-health status or observing endpoint health check failures while node-level checks succeed.

Recommended action:

Check netfilter FORWARD chain policy (should not be DROP), verify endpoint status with 'cilium endpoint list', inspect endpoint-specific logs with 'cilium endpoint get <id>', and monitor packet drops with 'cilium monitor --type drop'. Verify that BPF programs are correctly attached to the endpoint interface.