Hash collisions can trigger aggregation correctness issues during testing
infoconfigurationUpdated Mar 5, 2026(via Exa)
Technologies:
How to detect:
The force_hash_collisions feature forces all values to hash to the same value, which helps reproduce rare data correctness bugs that occur during spillable aggregations, particularly those involving hash collisions in production workloads.
Recommended action:
Enable the force_hash_collisions feature during testing to reproduce hash collision scenarios: set force_hash_collisions = ["datafusion-physical-plan/force_hash_collisions", "datafusion-common/force_hash_collisions"] in datafusion/core/Cargo.toml. Use this to validate fixes for aggregation correctness bugs before deploying to production.