Technologies/Trino/trino.cluster.active_workers

TrinoMetric

trino.cluster.active_workers

Active worker nodes

Dimensions:None

Available on:Native (1)

Prometheus (1)

Interface Metrics (2)

Native

trino.discovery.DiscoveryNodeManager.ActiveNodeCount

Number of active nodes in the Trino cluster

Dimensions:None

Prometheus

trino_active_workers

Number of active worker nodes in the cluster

Dimensions:None

Sources

trino.discovery.DiscoveryNodeManager.ActiveNodeCounttrino.io

Technical Annotations (51)

Configuration Parameters (14)

task.concurrencyrecommended: 4

Controls number of concurrent tasks per worker, set in reported configuration

task.max-drivers-per-taskrecommended: 8

Limits drivers per task, may affect parallelism when threads are blocked

task.http-timeout-threadsrecommended: 15-20

HTTP timeout thread pool size, exhaustion can cause worker communication issues

task.http-response-threadsrecommended: 300

HTTP response thread pool, insufficient threads can block worker communication

retry-policyrecommended: TASK

Enables task-level fault tolerance, but fails to retry in this scenario

query.remote-task.max-error-durationrecommended: 1m (increase recommended)

Time allowed for task errors before failure; may need increase for rescheduling

exchange.compression-enabledrecommended: true

Enables compression for exchange manager intermediate data

query.low-memory-killer.delayrecommended: 0s

Delay before killing queries due to low memory

-Xmx110Grecommended: 110G

max heap on 125G worker; may need tuning for off-heap memory pressure

-XX:ReservedCodeCacheSizerecommended: 2G

code cache for JIT compiled PageFilter classes; may fill under sustained load

node.internal-addressrecommended: Routable IP or hostname (not localhost)

Address must be reachable from coordinator for node discovery and communication

discovery.urirecommended: http://service-xdata-trino:8086/

Coordinator discovery endpoint for worker registration

http-server.http.portrecommended: 8086

Port must be accessible from coordinator for worker communication

query.client.timeoutrecommended: 5m

Maximum time to wait for query execution to begin

Error Signatures (13)

No nodes available to run queryexception

io.trino.spi.TrinoException: No nodes available to run queryexception

INTERNAL ERROR - NO_NODES_AVAILABLEerror code

Previously active node is missinglog pattern

Server refused connectionlog pattern

Failed communicating with serverlog pattern

Error getting task statuslog pattern

Error fetching memory infolog pattern

Error fetching node statelog pattern

Insufficient active worker nodes. Waited 5.00m for at least 1 workers, but only 0 workers are activelog pattern

Previously active node is missing: .* (last seen at localhost)log pattern

Waited 5.00m for at least 1 workerslog pattern

last seen at localhostlog pattern

CLI Commands (7)

kubectl get pods -wmonitoring

kubectl scale deployment my-trino-cluster-worker --replicas=Nremediation

kubectl exec -it <pod-name> -- cat /etc/trino/config.propertiesdiagnostic

top -H -p <pid>diagnostic

jstack -l <pid>diagnostic

EXPLAIN ANALYZE VERBOSEdiagnostic

kubectl get pods --kubeconfig /root/.kube/docker2.config | grep trinodiagnostic

Technical References (17)

worker parallelismconceptquery splitsconceptWAITINGconceptTIMED_WAITINGconceptBinPackingNodeAllocatorServicecomponentEventDrivenFaultTolerantQuerySchedulercomponentexchange managercomponentContinuousTaskStatusFetchercomponentTaskInfoFetchercomponentRemoteNodeMemorycomponentRemoteNodeStatecomponentio.trino.$gen.PageFiltercomponentC2 CompilerThreadcomponentio.trino.metadata.DiscoveryNodeManagercomponentnode.internal-addressconfiguration parameternode-state-pollercomponentDiscoveryNodeManagercomponent

Related Insights (8)

Small-memory servers have inefficient memory overhead ratioinfo

▸

Worker parallelism drops to zero with threads stuck in WAITING statecritical

▸

Fault-tolerant query fails with NO_NODES_AVAILABLE on worker pod terminationcritical

▸

Connection refusal pattern precedes worker node removal from clusterinfo

▸

Single worker CPU saturation causes cluster-wide query timeoutscritical

▸

Worker nodes fail to register with coordinator due to localhost internal addresscritical

▸

Query execution blocked waiting 5 minutes for worker node availabilitycritical

▸

Worker nodes fail to register with coordinator due to localhost misconfigurationcritical

▸