Apache DataFusion

Aggregation operations under-partition causing multi-fold performance degradation

warning
performanceUpdated Mar 10, 2026(via Exa)
How to detect:

Aggregation operations default to Final (single partition) mode when FinalPartitioned would provide better performance. TPC-DS query 4 shows 1.95x speedup and TPC-H query 3 shows 1.10x speedup when forced to partitioned mode. Issue becomes significant when accumulating more than 200k groups.

Recommended action:

Monitor datafusion.aggregate.groups metric. When group count exceeds 200k, consider forcing partitioned aggregation by reviewing optimizer settings. Examine query plans for Final mode aggregations on high-cardinality group-by operations. Dynamic switching proposed but not yet implemented.