Technologies/MySQL/trino.execution.failed_queries.one_minute

MySQLMetric

trino.execution.failed_queries.one_minute

Failed queries last minute

Dimensions:None

Technical Annotations (102)

Configuration Parameters (31)

exchange.deduplication-buffer-sizerecommended: >32MB or use exchange manager

Controls coordinator buffer for query stage output

exchange-manager.namerecommended: filesystem or hdfs

Enables external storage for spooled data

exchange.base-directoriesrecommended: Multiple comma-separated S3 URIs

Distributes spooling I/O across buckets to avoid rate limits

retry-policyrecommended: NONE (for unsupported connectors)

Controls query/task retry behavior; requires connector support

fault-tolerant-execution-standard-split-sizerecommended: 64MB

Standard split data size processed by tasks reading from source tables

fault-tolerant-execution-max-task-split-countrecommended: 2048

Maximum splits per task, protects against incorrect split weight

fault-tolerant-execution-max-partition-countrecommended: 50

Maximum partitions for distributed joins and aggregations with TASK policy

http.client.timeoutrecommended: 10000ms

default timeout before PAGE_TRANSPORT_TIMEOUT occurs

hive.collect-column-statistics-on-writerecommended: Evaluate disabling if data quality issues persist

Auto-collects column stats that can become skewed with outliers

query.remote-task.max-error-durationrecommended: 1m (increase recommended)

Time allowed for task errors before failure; may need increase for rescheduling

exchange.compression-enabledrecommended: true

Enables compression for exchange manager intermediate data

query.low-memory-killer.delayrecommended: 0s

Delay before killing queries due to low memory

query.client.timeoutrecommended: 5m

Maximum time to wait for query execution to begin

-Xmxrecommended: 105G

JVM max heap size; must accommodate query.max-memory-per-node + memory.heap-headroom-per-node with buffer for GC

query.max-memory-per-noderecommended: 85GB

Maximum query memory per node; tight allocation relative to heap can trigger excessive GC

memory.heap-headroom-per-noderecommended: 20GB

Reserved heap space per node; 105GB heap vs 85GB+20GB=105GB leaves no buffer for GC overhead

-XX:G1HeapRegionSizerecommended: 32M

G1 GC region size; impacts GC pause behavior for large heaps

query.max-lengthrecommended: 1000000

Default max characters; can increase to 1,000,000,000 max

query.max-stage-countrecommended: 150

Default limit prevents instability; increasing risks cluster-wide impact

optimizer.join-reordering-strategyrecommended: AUTOMATIC

Enables automatic join reordering based on cost estimates

join_distribution_typerecommended: BROADCAST

Forces broadcast joins for small dimension tables to eliminate shuffling

join_reordering_strategyrecommended: AUTOMATIC

Session-level setting for join reordering optimization

exchange.http-client.max-content-lengthrecommended: 4GB

Key setting - default too low for large data transfers between workers

query.remote-task.max-request-sizerecommended: 10GB

Maximum size for requests between coordinator and workers

exchange.http-client.request-timeoutrecommended: 30s to 60s

Maximum time for individual HTTP requests between workers

exchange.http-client.idle-timeoutrecommended: 60s to 2m

Keep-alive timeout for idle HTTP connections

exchange.http-client.max-connections-per-serverrecommended: 1000

Connection pool size per worker node

namerecommended: filesystem

Exchange manager type for S3-based buffering

base-directoriesrecommended: s3://exchange-spooling-bucket

S3 URI locations for exchange storage buffering

exchange.s3.regionrecommended: us-west-1

AWS region where exchange storage bucket is located

query.remote-task.enable-adaptive-request-sizerecommended: true

Default enabled to prevent OOM with large schemas

Error Signatures (27)

Exchange manager must be configured for the failure recovery capabilities to be fully functionalexception

software.amazon.awssdk.services.s3.model.S3Exception: Please reduce your request rateexception

This connector does not support query retriesexception

PAGE_TRANSPORT_TIMEOUTerror code

Failing abandoned tasklog pattern

Total timeout 10000 ms elapsedexception

Query exceeded distributed user memory limit of 40GBlog pattern

Query exceeded per-node memory limitexception

io.trino.operator.PageTransportTimeoutExceptionexception

java.util.concurrent.TimeoutException: Total timeout 10000 ms elapsedexception

io.trino.server.IoExceptionSuppressingWriterInterceptor Could not write to output: EofException(null)log pattern

No nodes available to run queryexception

io.trino.spi.TrinoException: No nodes available to run queryexception

INTERNAL ERROR - NO_NODES_AVAILABLEerror code

Previously active node is missinglog pattern

Waited 5.00m for at least 1 workerslog pattern

class io.trino.plugin.jdbc.JdbcSplit cannot be cast to class io.trino.plugin.jdbc.JdbcSplitexception

SQL Error [65536]error code

java.lang.ClassCastExceptionexception

Expected response code from http://.*:8080/v1/task/.*/status to be 200, but was 408http status

Error 408 Timeout: Timed outhttp status

io.trino.spi.TrinoExceptionexception

QUERY_TEXT_TOO_LARGEerror code

QUERY_HAS_TOO_MANY_STAGESerror code

REMOTE_TASK_ERRORerror code

Max requests queued per destination exceeded for HttpDestinationlog pattern

Encountered too many errors talking to a worker nodelog pattern

CLI Commands (8)

EXPLAINdiagnostic

SHOW STATSdiagnostic

kubectl get pods -wmonitoring

kubectl scale deployment my-trino-cluster-worker --replicas=Nremediation

kubectl exec -it <pod-name> -- cat /etc/trino/config.propertiesdiagnostic

ANALYZE TABLEdiagnostic

SELECT state, COUNT(*) AS count FROM trino_events.trino_queries WHERE create_time >= now() - interval '7' day GROUP BY state;

diagnostic

SELECT error_code, error_name, COUNT(*) AS failures FROM trino_events.trino_queries WHERE state = 'FAILED' AND create_time >= now() - interval '30' day GROUP BY error_code, error_name ORDER BY failures DESC LIMIT 10;

diagnostic

Technical References (36)

PLANNEDcomponentStage 1componentHttpPageBufferClientcomponentIcebergcomponentpartition filter pushdownconceptday_partitioncomponentbroadcast joinconceptcost-based optimizercomponentHive Metastorecomponentcardinality estimationconcepttrino_events.trino_queriescomponentstatecomponent/v1/task/{task-id}/resultscomponentBinPackingNodeAllocatorServicecomponentEventDrivenFaultTolerantQuerySchedulercomponentexchange managercomponentnode-state-pollercomponentio.trino.server.PluginClassLoadercomponentio.trino.plugin.jdbc.JdbcSplitcomponentG1 Old Generationcomponent/v1/task/.../statusfile pathcoordinatorcomponentremote taskcomponentHttpDestinationcomponentbroadcast joinsconceptexchange.http-clientcomponentworkercomponentworker nodescomponentexchange-manager.propertiesfile pathconfig.propertiesfile pathDelta Lake connectorcomponentHive connectorcomponentIceberg connectorcomponentMySQL connectorcomponentPostgreSQL connectorcomponentSQL Server connectorcomponent

Related Insights (28)

Query fails when result set exceeds buffer size without exchange managercritical

▸

S3 request rate throttling under I/O-intensive workloadswarning

▸

Connector incompatibility with fault-tolerant execution causes query failurescritical

▸

Improper task sizing prevents query completionwarning

▸

Partition count above 50 causes instability and poor performancewarning

▸

Worker resource exhaustion causes complete query failurecritical

▸

Long-running queries have increased failure probability without FTEwarning

▸

Worker stuck in PLANNED state after crash recoverycritical

▸

Undersized cluster capacity causes query execution failurescritical

▸

Memory limit exceeded after upgrade due to missing Iceberg partition filter pushdowncritical

▸

Skewed column statistics cause broadcast join memory overflowcritical

▸

Query failures and cancellations indicate platform or workload issueswarning

▸

PageTransportTimeoutException occurs without worker failure or obvious resource exhaustionwarning

▸

Fault-tolerant query fails with NO_NODES_AVAILABLE on worker pod terminationcritical

▸

Query execution blocked waiting 5 minutes for worker node availabilitycritical

▸

Duplicate PluginClassLoader instances cause JDBC query failurescritical

▸

Long GC pauses cause HTTP 408 task timeout failurescritical

▸

Insufficient resources cause query failurescritical

▸

Query rejected with QUERY_TEXT_TOO_LARGE errorwarning

▸

Remote task failures from coordinator communication timeoutwarning

▸

Excessive query stages cause cluster instability and unrelated query failurescritical

▸

Stale table statistics cause suboptimal join ordering and query failureswarning

▸

PAGE_TRANSPORT_TIMEOUT causes query failures after 1 minutecritical

▸

Worker node failures cause query failures without fault-tolerant executioncritical

▸

Write operations fail on catalogs without fault-tolerant write supportcritical

▸

High query failure rate indicates operational issueswarning

▸

Complex queries cause QUERY_HAS_TOO_MANY_STAGES and cluster instabilitycritical

▸

Out-of-memory errors with large schemas when adaptive requests disabledwarning

▸