Network Timeout in Hive Metastore Communication
warningPresto queries fail sporadically with SocketTimeoutException when communicating with Hive metastore, often under high query concurrency or when metastore is under-resourced relative to query load.
Monitor for increasing presto_execution_external_failures_one_minute_rate with java.net.SocketTimeoutException stack traces in logs. Check hive_ms.log on coordinator for errors. Look for correlation between failure rate and presto_execution_running_queries or presto_execution_started_queries_one_minute spikes.
Increase coordinator node size to provide more memory for embedded metastore. Raise hive.metastore-timeout, hive.s3.connect-timeout, and hive.s3.socket-timeout values (e.g., to 3m). Increase metastore heap memory allocation. Consider implementing connection pooling or metastore caching to reduce load.