Technologies/Apache Spark/datafusion.memory_pool.used
Apache SparkApache SparkMetric

datafusion.memory_pool.used

Memory actively used
Dimensions:None

Technical Annotations (83)

Configuration Parameters (19)
memory_pool.soft_limitrecommended: for spillable operators only
proposed soft limit allows non-spillable to exceed without error
memory_pool.hard_limitrecommended: soft_limit * 120%
proposed hard limit for total memory including non-spillable
datafusion.optimizer.allow_symmetric_joins_without_pruningrecommended: false
Disable for unbounded streams to prevent OOM; default true allows joins but risks memory issues
with_disk_manager_osrecommended: enabled
Enables OS-managed temporary disk storage for query spilling
with_fair_spill_poolrecommended: 100000000
Sets spill pool size in bytes (100MB example)
memory_limitrecommended: Set 500MB below actual available memory
Leave headroom for Vec exponential growth overhead not tracked by MemoryPool
MEMORY_FRACTIONrecommended: 1.0
Memory fraction used in RuntimeConfig, shown in reproducer setup
batch_sizerecommended: Reduce from default (e.g., from 8192 to lower value)
Smaller batches enable more fine-grained memory accounting and reduce per-batch allocation spikes
memory_pool.limitrecommended: increase from 1600 bytes minimum
Memory pool size of 1600 bytes is insufficient and triggers premature spilling in GroupedHashAggregateStream
RuntimeConfig.memory_poolrecommended: FairSpillPool with 2-3x expected memory
Provides headroom for sort_batch memory spike during spill
MemoryConsumer.with_can_spillrecommended: true
Enables spillable consumer for GroupedHashAggregateStream
enforce_batch_size_in_joinsrecommended: enabled
Restricts maximum output batch size of join operators to batch_size when OOM occurs
sort_spill_reservation_bytes
Controls memory reservation for spill operations during joins
datafusion.optimizer.prefer_hash_joinrecommended: false
Set to false for memory-constrained workloads to use SortMergeJoin instead
datafusion.memory_pool.limit
Set explicit limit to fail fast rather than exhaust system memory
prefer_hash_join
controls default join algorithm selection between hash and sort-merge
hash_join_single_partition_threshold
threshold for switching to single-partition hash join
hash_join_single_partition_threshold_rows
row count threshold for single-partition hash join selection
collect_left_threshold
Memory limit threshold to determine if both join sides can fit in memory for dynamic reordering
Error Signatures (11)
ResourcesExhausted("Additional allocation failedexception
Failed to allocate additionallog pattern
memory allocation of 25690112 bytes failedexception
DatafusionError/ResourcesExhausted: Failed to allocate additionalexception
Aborted (core dumped)exit code
ArrowError(InvalidArgumentError("number of columns(3) must match number of fields(2) in schema"), None)exception
number of columns must match number of fields in schemalog pattern
ResourcesExhaustedexception
intermediate_batch.num_rows() = 335544320log pattern
intermediate_batch.get_array_memory_size() = 5368709312log pattern
OOMerror code
CLI Commands (1)
ulimit -v 1152000diagnostic
Technical References (52)
multi_level_merge.rsfile pathnum_spillcomponentmemory_reservation_bytescomponentdeadlockconceptRepartitionExeccomponentExternalSortercomponentFairSpillPoolcomponenttry_growcomponentMemoryPoolcomponentPrometheuscomponentDataDogcomponentGrafanacomponentRuntimeConfigcomponentGroupedHashAggregateStreamcomponentgroup_aggregate_batch()componentVec::grow_amortized()componentdatafusion/physical-plan/src/aggregates/row_hash.rsfile pathAggregateExeccomponentHashAggregationExeccomponentsort_batchcomponentBatchSplittercomponentMemoryReservationcomponentRecordBatchcomponentmemory reservationsconceptJoinLeftDatacomponenthash joincomponentspillingconceptmemory_poolcomponentHashJoincomponentSortMergeJoincomponentmemory poolcomponentunnestcomponentGROUP BYcomponentarray_aggcomponentstreaming executionconceptNestedLoopJoinExeccomponentCartesian productconceptbuild_batch_from_indicescomponentnested loop joincomponentbuffered_left_batchescomponentRecordBatchStreamcomponentExecutionPlancomponenthash map indicescomponentCPU cacheconceptnext listcomponentJoinPlanner traitcomponentsort-merge joincomponentTPC-H query 7conceptTPC-H query 21conceptTPC-H query 4conceptbuild sideconceptprobe sideconcept
Related Insights (25)
Multi-partition sorting hits memory bugs causing spill coordination failurescritical
Memory pool lacks dual watermark causing premature OOM errorswarning
Blocking memory allocation causes deadlock risk when waiting for memorywarning
ExternalSort fails when non-spillable input operators exhaust memory poolcritical
FairSpillPool allows premature OOM failures on non-spillable operatorswarning
Resource exhaustion undetected without metric trackingwarning
Symmetric hash joins cause out-of-memory errors on unbounded streams without pruningcritical
Missing disk spill manager prevents large query executionwarning
GroupedHashAggregateStream OOM from Vec exponential growth during group-by with large stringscritical
Schema mismatch causes GroupedHashAggregateStream spill failure with multiple aggregationscritical
Hash aggregation spill doubles memory usage due to sort_batch copycritical
Cascaded joins produce oversized RecordBatches causing OOMcritical
Join operations create unaccounted large output batches causing memory pressurewarning
BatchSplitter does not prevent memory issues from oversized join batchescritical
Hash join build-side key memory not accounted leads to OOM riskwarning
Out of memory on large table joins with 16GB RAMcritical
Hash join fails when memory limit exceeded without spilling supportcritical
Memory exhaustion from insufficient memory pool limits on resource-constrained systemscritical
Unnest with GROUP BY causes unbounded memory growth despite streamingcritical
NestedLoopJoinExec creates extremely large intermediate batches causing memory exhaustioncritical
Nested loop join buffers entire left side causing OOM under memory constraintscritical
Join hash map memory footprint can be reduced with u32 indicesinfo
Join algorithm selection impacts memory usage vs CPU performance tradeoffswarning
Suboptimal hash join build-side selection causes performance degradationwarning
Peak memory consumption during hash joins with poor cardinality estimateswarning