Trino

Skewed column statistics cause broadcast join memory overflow

critical
performanceUpdated Jul 20, 2025
Technologies:
How to detect:

When column statistics contain extreme outlier values, Trino's cost-based optimizer underestimates post-filter cardinality and selects a broadcast join strategy. Workers receive billions of rows (1+ TB) instead of the expected small dataset, exceeding per-node memory limits and causing query failure.

Recommended action:

Use EXPLAIN to identify broadcast joins on unexpectedly large tables. Check column statistics with SHOW STATS for anomalous min/max ranges. Workaround: create boolean flag columns for range predicates instead of filtering directly on skewed columns. Long-term: fix upstream data quality by capping extreme values before statistics collection. Consider disabling auto-statistics collection if data quality cannot be guaranteed.