datafusion.operator.elapsed_time
Operator execution timeDimensions:None
Interface Metrics (3)
Dimensions:None
Dimensions:None
Sources
Technical Annotations (45)
Configuration Parameters (3)
datafusion.execution.parquet.maximum_parallel_row_group_writersrecommended: 1datafusion.execution.parquet.maximum_buffered_record_batches_per_streamrecommended: 2target_batch_sizerecommended: 8192CLI Commands (2)
EXPLAIN ANALYZE <query>diagnosticc.evaluate(batch)?.into_array(batch.num_rows())diagnosticTechnical References (40)
deadlockconceptNestedLoopJoinExeccomponentPR #16996componentTokiocomponentmorsel-driven parallelismconceptAggregateMode::PartialcomponentAggregateMode::FinalcomponentRepartitionExeccomponenthash value reuseconceptSinglePartitionedcomponentarrow::RowcomponentRowConvertercomponentCoalesceBatchesExeccomponentintern()componentdynamic partitioningconceptskipped_aggregation_rows metriccomponentGroupedHashAggregateStreamcomponentarrow_row::variable::encodecomponenthash seedconceptClickBenchcomponentbucket distributionconceptcache localityconceptUTF-8 boundary checksconceptASCII fast pathconceptdate_trunccomponenttimezone offsetconceptquadratic complexityconcepthash collisionconceptFilterExeccomponenttarget_partitionsconfiguration parameterHashJoinExeccomponenthash maskingconceptupdate_hashcomponentcollect_left_inputcomponentUnionArraycomponentbuild_row_join_batchcomponentScalarValue::to_array_of_sizecomponentLarge Union TypecomponentEBVcomponentcoalescecomponentRelated Insights (21)
Blocking memory allocation causes deadlock risk when waiting for memorywarning
▸
Nested loop join with tiny left input and massive right input causes CPU saturation without progresscritical
▸
Undefined pipeline success rate and duration thresholds delay detection of data issueswarning
▸
Writing large parquet files with default parallelism settings underutilizes available coresinfo
▸
Small batch size causes performance degradation through excessive allocationswarning
▸
Tokio async scheduler performs equivalently to custom push-based schedulerinfo
▸
High cardinality aggregations incur triple hashing overhead in multi-phase repartition planswarning
▸
Single-mode aggregation outperforms partial/final for high cardinality by avoiding row conversionswarning
▸
RepartitionExec and CoalesceBatchesExec overhead reduces aggregate performanceinfo
▸
Partial aggregation inefficiency with high cardinality causes performance degradationwarning
▸
RowConverter consumes 75% of aggregation time on high-cardinality group by operationswarning
▸
Hash seed reuse prevents rehashing during aggregation merge phaseinfo
▸
ASCII fast path bypassing improves string function performance up to 5xinfo
▸
Timezone specialization for common cases improves date_trunc by 7xinfo
▸
Same hash seed between HashMaps can cause quadratic complexitywarning
▸
Native DataFusion scan performance optimization opportunities identifiedinfo
▸
CoalesceBatchesExec placement after joins misses optimization opportunitiesinfo
▸
RepartitionExec double hashing causes unnecessary overheadwarning
▸
Duplicate expression evaluation wastes CPU during hash join buildinfo
▸
Nested Loop Join performance degrades 45x with Union Array types in DataFusion 50critical
▸
Complex filter expressions on Union types cause excessive evaluation overhead in Nested Loop Joinswarning
▸