Technologies/Apache DataFusion/datafusion.operator.memory_used

Apache DataFusionMetric

datafusion.operator.memory_used

Memory used by operator

Dimensions:None

Available on:Native (1)

Interface Metrics (1)

Native

mem_used

Current memory usage by the operator in bytes

Dimensions:None

Sources

mem_usedgithub.com

Technical Annotations (75)

Configuration Parameters (11)

datafusion.execution.sort_spill_reservation_bytesrecommended: 10485760

Default 10 MB reserved for in-memory merge during sort spilling

datafusion.execution.batch_sizerecommended: 8192

Default batch size for buffer-in-memory batches; increase if creating tiny batches

datafusion.execution.coalesce_batchesrecommended: true

Automatically coalesce small batches between operators

datafusion.execution.parquet.maximum_parallel_row_group_writersrecommended: 1

Default 1 for min memory; increase for idle cores when writing large files

datafusion.execution.parquet.maximum_buffered_record_batches_per_streamrecommended: 2

Default 2 for min memory; increase with row group writers for better throughput

datafusion.execution.target_partitionsrecommended: 1-4

Lower values reduce memory duplication in high cardinality GROUP BY queries

memory_limitrecommended: Set 500MB below actual available memory

Leave headroom for Vec exponential growth overhead not tracked by MemoryPool

MEMORY_FRACTIONrecommended: 1.0

Memory fraction used in RuntimeConfig, shown in reproducer setup

batch_sizerecommended: Reduce from default (e.g., from 8192 to lower value)

Smaller batches enable more fine-grained memory accounting and reduce per-batch allocation spikes

memory_pool.limitrecommended: increase from 1600 bytes minimum

Memory pool size of 1600 bytes is insufficient and triggers premature spilling in GroupedHashAggregateStream

datafusion.memory_pool.limit

Set explicit limit to fail fast rather than exhaust system memory

Error Signatures (11)

overflowexception

panicked at /Users/lili/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/arrow-select-57.3.0/src/interleave.rs:180:41

exception

memory allocation of 25690112 bytes failedexception

DatafusionError/ResourcesExhausted: Failed to allocate additionalexception

Aborted (core dumped)exit code

ArrowError(InvalidArgumentError("number of columns(3) must match number of fields(2) in schema"), None)exception

number of columns must match number of fields in schemalog pattern

ResourcesExhaustedexception

intermediate_batch.num_rows() = 335544320log pattern

intermediate_batch.get_array_memory_size() = 5368709312log pattern

OOMerror code

CLI Commands (3)

set datafusion.execution.target_partitions = 1;diagnostic

explain SELECT "WatchID", "ClientIP", COUNT(*) AS c FROM hits GROUP BY "WatchID", "ClientIP";diagnostic

ulimit -v 1152000diagnostic

Technical References (50)

fair poolcomponentspillable consumerconceptTopKHeap::emit_with_state()componentinterleave_record_batch()componentarrow-selectcomponentUtf8componenti32::MAXconceptRepartitionExeccomponentpull_from_inputcomponentoutput_channelscomponentRoundRobinBatchcomponentRowFormatcomponentdictionary interningconceptAggregateExec: mode=PartialcomponentAggregateExec: mode=FinalPartitionedcomponentTop K optimizationconceptSortPreservingMergeExeccomponentGlobalLimitExeccomponentGroupedHashAggregateStreamcomponentMemoryPoolcomponentgroup_aggregate_batch()componentVec::grow_amortized()componentdatafusion/physical-plan/src/aggregates/row_hash.rsfile pathAggregateExeccomponentFairSpillPoolcomponentNestedLoopJoinExeccomponentrecord batchconceptprobe-sideconceptbuild-sideconceptSortMergeJoincomponentpartitionconceptMemoryReservationcomponentpartitioned hash joinconceptTPC-Hconceptexternal joincomponentTopKcomponentSortcomponentunnestcomponentGROUP BYcomponentarray_aggcomponentstreaming executionconceptCartesian productconceptbuild_batch_from_indicescomponentapply_join_filter_to_indicescomponentnested loop joincomponentbuffered_left_batchescomponentRecordBatchStreamcomponentExecutionPlancomponentnested_loop_join.rsfile pathbatch_transformercomponent

Related Insights (23)

Fair pool unfairly allocates memory between spillable and non-spillable operatorswarning

▸

TopK operator panics on Utf8 string column overflow beyond i32::MAXcritical

▸

RepartitionExec unbounded buffering causes memory spikes with unbalanced partition processingwarning

▸

Sort operations run out of memory when sort_spill_reservation_bytes is insufficientcritical

▸

Tiny output batches cause excessive metadata memory consumptionwarning

▸

Writing large parquet files with default parallelism settings underutilizes available coresinfo

▸

Memory explosion from dictionary interning in row format optimizationwarning

▸

High cardinality aggregations cause memory usage to scale linearly with partition countcritical

▸

GROUP BY with ORDER BY and LIMIT still allocates memory for all groupswarning

▸

GroupedHashAggregateStream OOM from Vec exponential growth during group-by with large stringscritical

▸

Aggregate memory accounting updates only after full batch processingwarning

▸

Schema mismatch causes GroupedHashAggregateStream spill failure with multiple aggregationscritical

▸

RepartitionExec memory exhaustion during aggregation spillwarning

▸

Nested loop join creates excessive memory usage through oversized record batcheswarning

▸

SortMergeJoin memory usage exceeds HashJoin with high partition countswarning

▸

Partitioned hash join memory coordination failure with shared poolwarning

▸

TPC-H queries fail under fuzzed memory limits with external joinscritical

▸

TopK optimization not applied when limit pushdown fails with complex operatorswarning

▸

Unnest with GROUP BY causes unbounded memory growth despite streamingcritical

▸

NestedLoopJoinExec creates extremely large intermediate batches causing memory exhaustioncritical

▸

NestedLoopJoin filter evaluation creates oversized intermediate batcheswarning

▸

Nested loop join buffers entire left side causing OOM under memory constraintscritical

▸

Nested loop join produces massive intermediate result sets consuming memorywarning

▸