Technologies/MySQL/trino.execution.failed_queries.one_minute
MySQLMySQLMetric

trino.execution.failed_queries.one_minute

Failed queries last minute
Dimensions:None

Technical Annotations (102)

Configuration Parameters (31)
exchange.deduplication-buffer-sizerecommended: >32MB or use exchange manager
Controls coordinator buffer for query stage output
exchange-manager.namerecommended: filesystem or hdfs
Enables external storage for spooled data
exchange.base-directoriesrecommended: Multiple comma-separated S3 URIs
Distributes spooling I/O across buckets to avoid rate limits
retry-policyrecommended: NONE (for unsupported connectors)
Controls query/task retry behavior; requires connector support
fault-tolerant-execution-standard-split-sizerecommended: 64MB
Standard split data size processed by tasks reading from source tables
fault-tolerant-execution-max-task-split-countrecommended: 2048
Maximum splits per task, protects against incorrect split weight
fault-tolerant-execution-max-partition-countrecommended: 50
Maximum partitions for distributed joins and aggregations with TASK policy
http.client.timeoutrecommended: 10000ms
default timeout before PAGE_TRANSPORT_TIMEOUT occurs
hive.collect-column-statistics-on-writerecommended: Evaluate disabling if data quality issues persist
Auto-collects column stats that can become skewed with outliers
query.remote-task.max-error-durationrecommended: 1m (increase recommended)
Time allowed for task errors before failure; may need increase for rescheduling
exchange.compression-enabledrecommended: true
Enables compression for exchange manager intermediate data
query.low-memory-killer.delayrecommended: 0s
Delay before killing queries due to low memory
query.client.timeoutrecommended: 5m
Maximum time to wait for query execution to begin
-Xmxrecommended: 105G
JVM max heap size; must accommodate query.max-memory-per-node + memory.heap-headroom-per-node with buffer for GC
query.max-memory-per-noderecommended: 85GB
Maximum query memory per node; tight allocation relative to heap can trigger excessive GC
memory.heap-headroom-per-noderecommended: 20GB
Reserved heap space per node; 105GB heap vs 85GB+20GB=105GB leaves no buffer for GC overhead
-XX:G1HeapRegionSizerecommended: 32M
G1 GC region size; impacts GC pause behavior for large heaps
query.max-lengthrecommended: 1000000
Default max characters; can increase to 1,000,000,000 max
query.max-stage-countrecommended: 150
Default limit prevents instability; increasing risks cluster-wide impact
optimizer.join-reordering-strategyrecommended: AUTOMATIC
Enables automatic join reordering based on cost estimates
join_distribution_typerecommended: BROADCAST
Forces broadcast joins for small dimension tables to eliminate shuffling
join_reordering_strategyrecommended: AUTOMATIC
Session-level setting for join reordering optimization
exchange.http-client.max-content-lengthrecommended: 4GB
Key setting - default too low for large data transfers between workers
query.remote-task.max-request-sizerecommended: 10GB
Maximum size for requests between coordinator and workers
exchange.http-client.request-timeoutrecommended: 30s to 60s
Maximum time for individual HTTP requests between workers
exchange.http-client.idle-timeoutrecommended: 60s to 2m
Keep-alive timeout for idle HTTP connections
exchange.http-client.max-connections-per-serverrecommended: 1000
Connection pool size per worker node
namerecommended: filesystem
Exchange manager type for S3-based buffering
base-directoriesrecommended: s3://exchange-spooling-bucket
S3 URI locations for exchange storage buffering
exchange.s3.regionrecommended: us-west-1
AWS region where exchange storage bucket is located
query.remote-task.enable-adaptive-request-sizerecommended: true
Default enabled to prevent OOM with large schemas
Error Signatures (27)
Exchange manager must be configured for the failure recovery capabilities to be fully functionalexception
software.amazon.awssdk.services.s3.model.S3Exception: Please reduce your request rateexception
This connector does not support query retriesexception
PAGE_TRANSPORT_TIMEOUTerror code
Failing abandoned tasklog pattern
Total timeout 10000 ms elapsedexception
Query exceeded distributed user memory limit of 40GBlog pattern
Query exceeded per-node memory limitexception
io.trino.operator.PageTransportTimeoutExceptionexception
java.util.concurrent.TimeoutException: Total timeout 10000 ms elapsedexception
io.trino.server.IoExceptionSuppressingWriterInterceptor Could not write to output: EofException(null)log pattern
No nodes available to run queryexception
io.trino.spi.TrinoException: No nodes available to run queryexception
INTERNAL ERROR - NO_NODES_AVAILABLEerror code
Previously active node is missinglog pattern
Waited 5.00m for at least 1 workerslog pattern
class io.trino.plugin.jdbc.JdbcSplit cannot be cast to class io.trino.plugin.jdbc.JdbcSplitexception
SQL Error [65536]error code
java.lang.ClassCastExceptionexception
Expected response code from http://.*:8080/v1/task/.*/status to be 200, but was 408http status
Error 408 Timeout: Timed outhttp status
io.trino.spi.TrinoExceptionexception
QUERY_TEXT_TOO_LARGEerror code
QUERY_HAS_TOO_MANY_STAGESerror code
REMOTE_TASK_ERRORerror code
Max requests queued per destination exceeded for HttpDestinationlog pattern
Encountered too many errors talking to a worker nodelog pattern
CLI Commands (8)
EXPLAINdiagnostic
SHOW STATSdiagnostic
kubectl get pods -wmonitoring
kubectl scale deployment my-trino-cluster-worker --replicas=Nremediation
kubectl exec -it <pod-name> -- cat /etc/trino/config.propertiesdiagnostic
ANALYZE TABLEdiagnostic
SELECT state, COUNT(*) AS count FROM trino_events.trino_queries WHERE create_time >= now() - interval '7' day GROUP BY state;diagnostic
SELECT error_code, error_name, COUNT(*) AS failures FROM trino_events.trino_queries WHERE state = 'FAILED' AND create_time >= now() - interval '30' day GROUP BY error_code, error_name ORDER BY failures DESC LIMIT 10;diagnostic
Technical References (36)
PLANNEDcomponentStage 1componentHttpPageBufferClientcomponentIcebergcomponentpartition filter pushdownconceptday_partitioncomponentbroadcast joinconceptcost-based optimizercomponentHive Metastorecomponentcardinality estimationconcepttrino_events.trino_queriescomponentstatecomponent/v1/task/{task-id}/resultscomponentBinPackingNodeAllocatorServicecomponentEventDrivenFaultTolerantQuerySchedulercomponentexchange managercomponentnode-state-pollercomponentio.trino.server.PluginClassLoadercomponentio.trino.plugin.jdbc.JdbcSplitcomponentG1 Old Generationcomponent/v1/task/.../statusfile pathcoordinatorcomponentremote taskcomponentHttpDestinationcomponentbroadcast joinsconceptexchange.http-clientcomponentworkercomponentworker nodescomponentexchange-manager.propertiesfile pathconfig.propertiesfile pathDelta Lake connectorcomponentHive connectorcomponentIceberg connectorcomponentMySQL connectorcomponentPostgreSQL connectorcomponentSQL Server connectorcomponent
Related Insights (28)
Query fails when result set exceeds buffer size without exchange managercritical
S3 request rate throttling under I/O-intensive workloadswarning
Connector incompatibility with fault-tolerant execution causes query failurescritical
Improper task sizing prevents query completionwarning
Partition count above 50 causes instability and poor performancewarning
Worker resource exhaustion causes complete query failurecritical
Long-running queries have increased failure probability without FTEwarning
Worker stuck in PLANNED state after crash recoverycritical
Undersized cluster capacity causes query execution failurescritical
Memory limit exceeded after upgrade due to missing Iceberg partition filter pushdowncritical
Skewed column statistics cause broadcast join memory overflowcritical
Query failures and cancellations indicate platform or workload issueswarning
PageTransportTimeoutException occurs without worker failure or obvious resource exhaustionwarning
Fault-tolerant query fails with NO_NODES_AVAILABLE on worker pod terminationcritical
Query execution blocked waiting 5 minutes for worker node availabilitycritical
Duplicate PluginClassLoader instances cause JDBC query failurescritical
Long GC pauses cause HTTP 408 task timeout failurescritical
Insufficient resources cause query failurescritical
Query rejected with QUERY_TEXT_TOO_LARGE errorwarning
Remote task failures from coordinator communication timeoutwarning
Excessive query stages cause cluster instability and unrelated query failurescritical
Stale table statistics cause suboptimal join ordering and query failureswarning
PAGE_TRANSPORT_TIMEOUT causes query failures after 1 minutecritical
Worker node failures cause query failures without fault-tolerant executioncritical
Write operations fail on catalogs without fault-tolerant write supportcritical
High query failure rate indicates operational issueswarning
Complex queries cause QUERY_HAS_TOO_MANY_STAGES and cluster instabilitycritical
Out-of-memory errors with large schemas when adaptive requests disabledwarning