Technologies/Trino/trino.cluster.active_workers
TrinoTrinoMetric

trino.cluster.active_workers

Active worker nodes
Dimensions:None
Available on:Native (1)PrometheusPrometheus (1)
Interface Metrics (2)
Native
Number of active nodes in the Trino cluster
Dimensions:None
PrometheusPrometheus
Number of active worker nodes in the cluster
Dimensions:None

Technical Annotations (51)

Configuration Parameters (14)
task.concurrencyrecommended: 4
Controls number of concurrent tasks per worker, set in reported configuration
task.max-drivers-per-taskrecommended: 8
Limits drivers per task, may affect parallelism when threads are blocked
task.http-timeout-threadsrecommended: 15-20
HTTP timeout thread pool size, exhaustion can cause worker communication issues
task.http-response-threadsrecommended: 300
HTTP response thread pool, insufficient threads can block worker communication
retry-policyrecommended: TASK
Enables task-level fault tolerance, but fails to retry in this scenario
query.remote-task.max-error-durationrecommended: 1m (increase recommended)
Time allowed for task errors before failure; may need increase for rescheduling
exchange.compression-enabledrecommended: true
Enables compression for exchange manager intermediate data
query.low-memory-killer.delayrecommended: 0s
Delay before killing queries due to low memory
-Xmx110Grecommended: 110G
max heap on 125G worker; may need tuning for off-heap memory pressure
-XX:ReservedCodeCacheSizerecommended: 2G
code cache for JIT compiled PageFilter classes; may fill under sustained load
node.internal-addressrecommended: Routable IP or hostname (not localhost)
Address must be reachable from coordinator for node discovery and communication
discovery.urirecommended: http://service-xdata-trino:8086/
Coordinator discovery endpoint for worker registration
http-server.http.portrecommended: 8086
Port must be accessible from coordinator for worker communication
query.client.timeoutrecommended: 5m
Maximum time to wait for query execution to begin
Error Signatures (13)
No nodes available to run queryexception
io.trino.spi.TrinoException: No nodes available to run queryexception
INTERNAL ERROR - NO_NODES_AVAILABLEerror code
Previously active node is missinglog pattern
Server refused connectionlog pattern
Failed communicating with serverlog pattern
Error getting task statuslog pattern
Error fetching memory infolog pattern
Error fetching node statelog pattern
Insufficient active worker nodes. Waited 5.00m for at least 1 workers, but only 0 workers are activelog pattern
Previously active node is missing: .* (last seen at localhost)log pattern
Waited 5.00m for at least 1 workerslog pattern
last seen at localhostlog pattern
CLI Commands (7)
kubectl get pods -wmonitoring
kubectl scale deployment my-trino-cluster-worker --replicas=Nremediation
kubectl exec -it <pod-name> -- cat /etc/trino/config.propertiesdiagnostic
top -H -p <pid>diagnostic
jstack -l <pid>diagnostic
EXPLAIN ANALYZE VERBOSEdiagnostic
kubectl get pods --kubeconfig /root/.kube/docker2.config | grep trinodiagnostic
Technical References (17)
worker parallelismconceptquery splitsconceptWAITINGconceptTIMED_WAITINGconceptBinPackingNodeAllocatorServicecomponentEventDrivenFaultTolerantQuerySchedulercomponentexchange managercomponentContinuousTaskStatusFetchercomponentTaskInfoFetchercomponentRemoteNodeMemorycomponentRemoteNodeStatecomponentio.trino.$gen.PageFiltercomponentC2 CompilerThreadcomponentio.trino.metadata.DiscoveryNodeManagercomponentnode.internal-addressconfiguration parameternode-state-pollercomponentDiscoveryNodeManagercomponent
Related Insights (8)
Small-memory servers have inefficient memory overhead ratioinfo
Worker parallelism drops to zero with threads stuck in WAITING statecritical
Fault-tolerant query fails with NO_NODES_AVAILABLE on worker pod terminationcritical
Connection refusal pattern precedes worker node removal from clusterinfo
Single worker CPU saturation causes cluster-wide query timeoutscritical
Worker nodes fail to register with coordinator due to localhost internal addresscritical
Query execution blocked waiting 5 minutes for worker node availabilitycritical
Worker nodes fail to register with coordinator due to localhost misconfigurationcritical