Boolean flag columns bypass skewed statistics in filter selectivity
infoperformanceUpdated Jul 20, 2025
Technologies:
How to detect:
When range predicates on columns with skewed statistics cause optimizer to misestimate cardinality, transforming the range check into a pre-computed boolean flag column allows the optimizer to use accurate statistics on the boolean column instead of the skewed numeric column.
Recommended action:
For columns with known statistical skew causing query plan issues, add computed boolean flag columns at data ingestion time (e.g., CASE WHEN duration BETWEEN 0 AND 1800000 THEN 1 ELSE 0 END AS is_valid_duration). Filter on the flag column (is_valid_duration = 1) instead of the raw column. This gives the optimizer accurate cardinality estimates for the boolean column's statistics.