Technologies/Apache DataFusion/datafusion.operator.memory_used
Apache DataFusionApache DataFusionMetric

datafusion.operator.memory_used

Memory used by operator
Dimensions:None
Available on:Native (1)
Interface Metrics (1)
Native
Current memory usage by the operator in bytes
Dimensions:None

Technical Annotations (75)

Configuration Parameters (11)
datafusion.execution.sort_spill_reservation_bytesrecommended: 10485760
Default 10 MB reserved for in-memory merge during sort spilling
datafusion.execution.batch_sizerecommended: 8192
Default batch size for buffer-in-memory batches; increase if creating tiny batches
datafusion.execution.coalesce_batchesrecommended: true
Automatically coalesce small batches between operators
datafusion.execution.parquet.maximum_parallel_row_group_writersrecommended: 1
Default 1 for min memory; increase for idle cores when writing large files
datafusion.execution.parquet.maximum_buffered_record_batches_per_streamrecommended: 2
Default 2 for min memory; increase with row group writers for better throughput
datafusion.execution.target_partitionsrecommended: 1-4
Lower values reduce memory duplication in high cardinality GROUP BY queries
memory_limitrecommended: Set 500MB below actual available memory
Leave headroom for Vec exponential growth overhead not tracked by MemoryPool
MEMORY_FRACTIONrecommended: 1.0
Memory fraction used in RuntimeConfig, shown in reproducer setup
batch_sizerecommended: Reduce from default (e.g., from 8192 to lower value)
Smaller batches enable more fine-grained memory accounting and reduce per-batch allocation spikes
memory_pool.limitrecommended: increase from 1600 bytes minimum
Memory pool size of 1600 bytes is insufficient and triggers premature spilling in GroupedHashAggregateStream
datafusion.memory_pool.limit
Set explicit limit to fail fast rather than exhaust system memory
Error Signatures (11)
overflowexception
panicked at /Users/lili/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-select-57.3.0/src/interleave.rs:180:41exception
memory allocation of 25690112 bytes failedexception
DatafusionError/ResourcesExhausted: Failed to allocate additionalexception
Aborted (core dumped)exit code
ArrowError(InvalidArgumentError("number of columns(3) must match number of fields(2) in schema"), None)exception
number of columns must match number of fields in schemalog pattern
ResourcesExhaustedexception
intermediate_batch.num_rows() = 335544320log pattern
intermediate_batch.get_array_memory_size() = 5368709312log pattern
OOMerror code
CLI Commands (3)
set datafusion.execution.target_partitions = 1;diagnostic
explain SELECT "WatchID", "ClientIP", COUNT(*) AS c FROM hits GROUP BY "WatchID", "ClientIP";diagnostic
ulimit -v 1152000diagnostic
Technical References (50)
fair poolcomponentspillable consumerconceptTopKHeap::emit_with_state()componentinterleave_record_batch()componentarrow-selectcomponentUtf8componenti32::MAXconceptRepartitionExeccomponentpull_from_inputcomponentoutput_channelscomponentRoundRobinBatchcomponentRowFormatcomponentdictionary interningconceptAggregateExec: mode=PartialcomponentAggregateExec: mode=FinalPartitionedcomponentTop K optimizationconceptSortPreservingMergeExeccomponentGlobalLimitExeccomponentGroupedHashAggregateStreamcomponentMemoryPoolcomponentgroup_aggregate_batch()componentVec::grow_amortized()componentdatafusion/physical-plan/src/aggregates/row_hash.rsfile pathAggregateExeccomponentFairSpillPoolcomponentNestedLoopJoinExeccomponentrecord batchconceptprobe-sideconceptbuild-sideconceptSortMergeJoincomponentpartitionconceptMemoryReservationcomponentpartitioned hash joinconceptTPC-Hconceptexternal joincomponentTopKcomponentSortcomponentunnestcomponentGROUP BYcomponentarray_aggcomponentstreaming executionconceptCartesian productconceptbuild_batch_from_indicescomponentapply_join_filter_to_indicescomponentnested loop joincomponentbuffered_left_batchescomponentRecordBatchStreamcomponentExecutionPlancomponentnested_loop_join.rsfile pathbatch_transformercomponent
Related Insights (23)
Fair pool unfairly allocates memory between spillable and non-spillable operatorswarning
TopK operator panics on Utf8 string column overflow beyond i32::MAXcritical
RepartitionExec unbounded buffering causes memory spikes with unbalanced partition processingwarning
Sort operations run out of memory when sort_spill_reservation_bytes is insufficientcritical
Tiny output batches cause excessive metadata memory consumptionwarning
Writing large parquet files with default parallelism settings underutilizes available coresinfo
Memory explosion from dictionary interning in row format optimizationwarning
High cardinality aggregations cause memory usage to scale linearly with partition countcritical
GROUP BY with ORDER BY and LIMIT still allocates memory for all groupswarning
GroupedHashAggregateStream OOM from Vec exponential growth during group-by with large stringscritical
Aggregate memory accounting updates only after full batch processingwarning
Schema mismatch causes GroupedHashAggregateStream spill failure with multiple aggregationscritical
RepartitionExec memory exhaustion during aggregation spillwarning
Nested loop join creates excessive memory usage through oversized record batcheswarning
SortMergeJoin memory usage exceeds HashJoin with high partition countswarning
Partitioned hash join memory coordination failure with shared poolwarning
TPC-H queries fail under fuzzed memory limits with external joinscritical
TopK optimization not applied when limit pushdown fails with complex operatorswarning
Unnest with GROUP BY causes unbounded memory growth despite streamingcritical
NestedLoopJoinExec creates extremely large intermediate batches causing memory exhaustioncritical
NestedLoopJoin filter evaluation creates oversized intermediate batcheswarning
Nested loop join buffers entire left side causing OOM under memory constraintscritical
Nested loop join produces massive intermediate result sets consuming memorywarning