Technologies/Apache DataFusion/datafusion.operator.input_batches
Apache DataFusionApache DataFusionMetric

datafusion.operator.input_batches

Input batches to operator
Dimensions:None
Available on:Native (1)
Interface Metrics (1)
Native
Number of RecordBatch inputs processed by the operator
Dimensions:None

Technical Annotations (22)

Configuration Parameters (4)
batch_sizerecommended: 8192
Default value; non-default values cause crashes until Arrow-rs #9506 is fixed
target_batch_sizerecommended: 8192
Default for CoalesceBatchesExec, may need tuning based on cardinality
allow_symmetric_joins_without_pruning
controls whether symmetric joins are permitted without partition pruning
repartition_joins
enables join repartitioning for distributed execution
Error Signatures (2)
assertion `left == right` failedexception
task 17 panicked with message "assertion `left == right` failedexception
CLI Commands (1)
cargo testdiagnostic
Technical References (15)
arrow-selectcomponentcoalesce/primitive.rsfile pathtokio-runtime-workercomponentmetrics collectioncomponentdata batchesconceptdata blocksconceptfragmentsconceptRepartitionExeccomponentCoalesceBatchesExeccomponentintern()componentFilterExeccomponenttarget_partitionsconfiguration parametersymmetric joinconceptpartition pruningconceptconcat_batchescomponent
Related Insights (9)
DataFusion crashes with assertion failures when batch_size differs from default 8192critical
Metrics collection overhead degrades query performance on small batcheswarning
Fragmented small data blocks cause processing inefficiencywarning
Small batch size causes performance degradation through excessive allocationswarning
RepartitionExec and CoalesceBatchesExec overhead reduces aggregate performanceinfo
Aggregate memory accounting updates only after full batch processingwarning
CoalesceBatchesExec placement after joins misses optimization opportunitiesinfo
Symmetric joins without pruning can cause performance degradationwarning
concat_batches overhead causes 33x query performance degradationcritical