Trino

NodeStateManager shutdown blocked by discovery announcement retries

warning
availabilityUpdated Jun 24, 2025(via Exa)
Technologies:
How to detect:

During worker shutdown initiated via state change to SHUTTING_DOWN, NodeStateManager continues attempting service announcements to the discovery server indefinitely. If the coordinator is unavailable, DiscoveryException errors from io.airlift.discovery.client.Announcer and CachingServiceSelector loop without timeout, blocking shutdown even when no active tasks remain.

Recommended action:

Monitor logs from io.trino.server.NodeStateManager for 'Waiting for <N> active tasks to finish' messages. If workers hang despite no active tasks, the issue is announcement retry loops. Propose implementing retry limits or timeouts for discovery announcements during shutdown. Check that NodeStateManager proceeds to shutdownAction.onShutdown() (System.exit(0)) if lifeCycleManager.stop() times out.