datafusion.aggregate.groups
Distinct aggregation groupsDimensions:None
Technical Annotations (50)
Configuration Parameters (6)
datafusion.execution.skip_partial_aggregation_probe_ratio_thresholdrecommended: 0.8datafusion.execution.skip_partial_aggregation_probe_rows_thresholdrecommended: 100000datafusion.execution.target_partitionsrecommended: 1-4memory_limitrecommended: Set 500MB below actual available memoryMEMORY_FRACTIONrecommended: 1.0batch_sizerecommended: Reduce from default (e.g., from 8192 to lower value)Error Signatures (3)
memory allocation of 25690112 bytes failedexceptionDatafusionError/ResourcesExhausted: Failed to allocate additionalexceptionAborted (core dumped)exit codeCLI Commands (4)
set datafusion.execution.target_partitions = 1;diagnosticexplain SELECT "WatchID", "ClientIP", COUNT(*) AS c FROM hits GROUP BY "WatchID", "ClientIP";diagnosticulimit -v 1152000diagnosticwriter.write(&batch)diagnosticTechnical References (37)
AggregateMode::PartialcomponentAggregateMode::FinalcomponentRepartitionExeccomponenthash value reuseconceptSinglePartitionedcomponentarrow::RowcomponentRowConvertercomponentcardinalityconceptpartial aggregate skippingconceptAggregateExec: mode=PartialcomponentAggregateExec: mode=FinalPartitionedcomponentTop K optimizationconceptSortPreservingMergeExeccomponentGlobalLimitExeccomponentGroupedHashAggregateStreamcomponentarrow_row::variable::encodecomponenthash seedconceptClickBenchcomponentbucket distributionconceptcache localityconceptGroupValuesColumncomponentvectorized_interncomponentGroupOrdering::FullcomponentMemoryPoolcomponentgroup_aggregate_batch()componentVec::grow_amortized()componentdatafusion/physical-plan/src/aggregates/row_hash.rsfile pathIPCWritercomponentemitcomponentAggregateExeccomponenttotal_byte_sizeconceptjoin_selection.rsfile pathphysical-plan/aggregates/mod.rsfile pathphysical-optimizer/join_selection.rsfile pathhash_join/exec.rsfile pathFinalcomponentFinalPartitionedcomponentRelated Insights (13)
Partial aggregation continues despite low aggregation ratio, wasting resourceswarning
▸
High cardinality aggregations incur triple hashing overhead in multi-phase repartition planswarning
▸
Single-mode aggregation outperforms partial/final for high cardinality by avoiding row conversionswarning
▸
Low cardinality aggregates benefit from partial/final mode while high cardinality suffersinfo
▸
High cardinality aggregations cause memory usage to scale linearly with partition countcritical
▸
GROUP BY with ORDER BY and LIMIT still allocates memory for all groupswarning
▸
RowConverter consumes 75% of aggregation time on high-cardinality group by operationswarning
▸
Hash seed reuse prevents rehashing during aggregation merge phaseinfo
▸
Spillable aggregation produces duplicate group keys due to internal state mismatchcritical
▸
GroupedHashAggregateStream OOM from Vec exponential growth during group-by with large stringscritical
▸
Large single-batch spill files cause merge failureswarning
▸
Missing byte-size statistics after aggregation causes incorrect join build-side selectionwarning
▸
Aggregation operations under-partition causing multi-fold performance degradationwarning
▸