trino.execution.failed_queries.one_minute
Failed queries last minuteDimensions:None
Technical Annotations (102)
Configuration Parameters (31)
exchange.deduplication-buffer-sizerecommended: >32MB or use exchange managerexchange-manager.namerecommended: filesystem or hdfsexchange.base-directoriesrecommended: Multiple comma-separated S3 URIsretry-policyrecommended: NONE (for unsupported connectors)fault-tolerant-execution-standard-split-sizerecommended: 64MBfault-tolerant-execution-max-task-split-countrecommended: 2048fault-tolerant-execution-max-partition-countrecommended: 50http.client.timeoutrecommended: 10000mshive.collect-column-statistics-on-writerecommended: Evaluate disabling if data quality issues persistquery.remote-task.max-error-durationrecommended: 1m (increase recommended)exchange.compression-enabledrecommended: truequery.low-memory-killer.delayrecommended: 0squery.client.timeoutrecommended: 5m-Xmxrecommended: 105Gquery.max-memory-per-noderecommended: 85GBmemory.heap-headroom-per-noderecommended: 20GB-XX:G1HeapRegionSizerecommended: 32Mquery.max-lengthrecommended: 1000000query.max-stage-countrecommended: 150optimizer.join-reordering-strategyrecommended: AUTOMATICjoin_distribution_typerecommended: BROADCASTjoin_reordering_strategyrecommended: AUTOMATICexchange.http-client.max-content-lengthrecommended: 4GBquery.remote-task.max-request-sizerecommended: 10GBexchange.http-client.request-timeoutrecommended: 30s to 60sexchange.http-client.idle-timeoutrecommended: 60s to 2mexchange.http-client.max-connections-per-serverrecommended: 1000namerecommended: filesystembase-directoriesrecommended: s3://exchange-spooling-bucketexchange.s3.regionrecommended: us-west-1query.remote-task.enable-adaptive-request-sizerecommended: trueError Signatures (27)
Exchange manager must be configured for the failure recovery capabilities to be fully functionalexceptionsoftware.amazon.awssdk.services.s3.model.S3Exception: Please reduce your request rateexceptionThis connector does not support query retriesexceptionPAGE_TRANSPORT_TIMEOUTerror codeFailing abandoned tasklog patternTotal timeout 10000 ms elapsedexceptionQuery exceeded distributed user memory limit of 40GBlog patternQuery exceeded per-node memory limitexceptionio.trino.operator.PageTransportTimeoutExceptionexceptionjava.util.concurrent.TimeoutException: Total timeout 10000 ms elapsedexceptionio.trino.server.IoExceptionSuppressingWriterInterceptor Could not write to output: EofException(null)log patternNo nodes available to run queryexceptionio.trino.spi.TrinoException: No nodes available to run queryexceptionINTERNAL ERROR - NO_NODES_AVAILABLEerror codePreviously active node is missinglog patternWaited 5.00m for at least 1 workerslog patternclass io.trino.plugin.jdbc.JdbcSplit cannot be cast to class io.trino.plugin.jdbc.JdbcSplitexceptionSQL Error [65536]error codejava.lang.ClassCastExceptionexceptionExpected response code from http://.*:8080/v1/task/.*/status to be 200, but was 408http statusError 408 Timeout: Timed outhttp statusio.trino.spi.TrinoExceptionexceptionQUERY_TEXT_TOO_LARGEerror codeQUERY_HAS_TOO_MANY_STAGESerror codeREMOTE_TASK_ERRORerror codeMax requests queued per destination exceeded for HttpDestinationlog patternEncountered too many errors talking to a worker nodelog patternCLI Commands (8)
EXPLAINdiagnosticSHOW STATSdiagnostickubectl get pods -wmonitoringkubectl scale deployment my-trino-cluster-worker --replicas=Nremediationkubectl exec -it <pod-name> -- cat /etc/trino/config.propertiesdiagnosticANALYZE TABLEdiagnosticSELECT state, COUNT(*) AS count FROM trino_events.trino_queries WHERE create_time >= now() - interval '7' day GROUP BY state;diagnosticSELECT error_code, error_name, COUNT(*) AS failures FROM trino_events.trino_queries WHERE state = 'FAILED' AND create_time >= now() - interval '30' day GROUP BY error_code, error_name ORDER BY failures DESC LIMIT 10;diagnosticTechnical References (36)
PLANNEDcomponentStage 1componentHttpPageBufferClientcomponentIcebergcomponentpartition filter pushdownconceptday_partitioncomponentbroadcast joinconceptcost-based optimizercomponentHive Metastorecomponentcardinality estimationconcepttrino_events.trino_queriescomponentstatecomponent/v1/task/{task-id}/resultscomponentBinPackingNodeAllocatorServicecomponentEventDrivenFaultTolerantQuerySchedulercomponentexchange managercomponentnode-state-pollercomponentio.trino.server.PluginClassLoadercomponentio.trino.plugin.jdbc.JdbcSplitcomponentG1 Old Generationcomponent/v1/task/.../statusfile pathcoordinatorcomponentremote taskcomponentHttpDestinationcomponentbroadcast joinsconceptexchange.http-clientcomponentworkercomponentworker nodescomponentexchange-manager.propertiesfile pathconfig.propertiesfile pathDelta Lake connectorcomponentHive connectorcomponentIceberg connectorcomponentMySQL connectorcomponentPostgreSQL connectorcomponentSQL Server connectorcomponentRelated Insights (28)
Query fails when result set exceeds buffer size without exchange managercritical
▸
S3 request rate throttling under I/O-intensive workloadswarning
▸
Connector incompatibility with fault-tolerant execution causes query failurescritical
▸
Improper task sizing prevents query completionwarning
▸
Partition count above 50 causes instability and poor performancewarning
▸
Worker resource exhaustion causes complete query failurecritical
▸
Long-running queries have increased failure probability without FTEwarning
▸
Worker stuck in PLANNED state after crash recoverycritical
▸
Undersized cluster capacity causes query execution failurescritical
▸
Memory limit exceeded after upgrade due to missing Iceberg partition filter pushdowncritical
▸
Skewed column statistics cause broadcast join memory overflowcritical
▸
Query failures and cancellations indicate platform or workload issueswarning
▸
PageTransportTimeoutException occurs without worker failure or obvious resource exhaustionwarning
▸
Fault-tolerant query fails with NO_NODES_AVAILABLE on worker pod terminationcritical
▸
Query execution blocked waiting 5 minutes for worker node availabilitycritical
▸
Duplicate PluginClassLoader instances cause JDBC query failurescritical
▸
Long GC pauses cause HTTP 408 task timeout failurescritical
▸
Insufficient resources cause query failurescritical
▸
Query rejected with QUERY_TEXT_TOO_LARGE errorwarning
▸
Remote task failures from coordinator communication timeoutwarning
▸
Excessive query stages cause cluster instability and unrelated query failurescritical
▸
Stale table statistics cause suboptimal join ordering and query failureswarning
▸
PAGE_TRANSPORT_TIMEOUT causes query failures after 1 minutecritical
▸
Worker node failures cause query failures without fault-tolerant executioncritical
▸
Write operations fail on catalogs without fault-tolerant write supportcritical
▸
High query failure rate indicates operational issueswarning
▸
Complex queries cause QUERY_HAS_TOO_MANY_STAGES and cluster instabilitycritical
▸
Out-of-memory errors with large schemas when adaptive requests disabledwarning
▸