Aggregate dynamic filters cause overhead when pushed to Parquet reader

warning

performanceUpdated Feb 12, 2026(via Exa)

Sources

[EPIC] Fix performance regressions when enabling parquet filter ...github.com

Technologies:

Apache DataFusionsubject

How to detect:

Aggregate dynamic filters start with lit(true) for the first record batch, allowing all rows to qualify initially. This causes files_ranges_pruned_statistics to match more files (31 matched vs 6 without filter) and generates 23.14M pushdown_rows_pruned overhead. The overhead compounds because pruning predicates are rebuilt for each record batch.

Recommended action:

Exclude aggregate dynamic filters from row-level Parquet evaluation. Implement filter simplification to discard filters that are always true. Add caching logic to skip rebuilding pruning predicates when filter bounds haven't changed between batches. Consider skipping row group evaluation for aggregate queries entirely.