datafusion.operator.input_batches

Input batches to operator

Dimensions:None

Available on:Native (1)

Interface Metrics (1)

Native

input_batches

Number of RecordBatch inputs processed by the operator

Dimensions:None

Sources

input_batchesgithub.com

Technical Annotations (22)

Configuration Parameters (4)

batch_sizerecommended: 8192

Default value; non-default values cause crashes until Arrow-rs #9506 is fixed

target_batch_sizerecommended: 8192

Default for CoalesceBatchesExec, may need tuning based on cardinality

allow_symmetric_joins_without_pruning

controls whether symmetric joins are permitted without partition pruning

repartition_joins

enables join repartitioning for distributed execution

Error Signatures (2)

assertion `left == right` failedexception

task 17 panicked with message "assertion `left == right` failedexception

CLI Commands (1)

cargo testdiagnostic

Technical References (15)

arrow-selectcomponentcoalesce/primitive.rsfile pathtokio-runtime-workercomponentmetrics collectioncomponentdata batchesconceptdata blocksconceptfragmentsconceptRepartitionExeccomponentCoalesceBatchesExeccomponentintern()componentFilterExeccomponenttarget_partitionsconfiguration parametersymmetric joinconceptpartition pruningconceptconcat_batchescomponent

Related Insights (9)

DataFusion crashes with assertion failures when batch_size differs from default 8192critical

▸

Metrics collection overhead degrades query performance on small batcheswarning

▸

Fragmented small data blocks cause processing inefficiencywarning

▸

Small batch size causes performance degradation through excessive allocationswarning

▸

RepartitionExec and CoalesceBatchesExec overhead reduces aggregate performanceinfo

▸

Aggregate memory accounting updates only after full batch processingwarning

▸

CoalesceBatchesExec placement after joins misses optimization opportunitiesinfo

▸

Symmetric joins without pruning can cause performance degradationwarning

▸

concat_batches overhead causes 33x query performance degradationcritical

▸