Technologies/Apache DataFusion/datafusion.operator.elapsed_time

Apache DataFusionMetric

datafusion.operator.elapsed_time

Operator execution time

Dimensions:None

Available on:

Prometheus (1)Native (1)

OpenTelemetry (1)

Interface Metrics (3)

Prometheus

datafusion_physical_plan_elapsed_compute_seconds

CPU time spent computing in a physical plan operator

Dimensions:None

Native

elapsed_compute

Time spent in actual computation excluding IO waits

Dimensions:None

OpenTelemetry

datafusion.operator.elapsed_time

Cumulative time spent executing a specific physical plan operator

Dimensions:None

Sources

datafusion_physical_plan_elapsed_compute_secondsgithub.com

elapsed_computegithub.com

datafusion.operator.elapsed_timegithub.com

Technical Annotations (45)

Configuration Parameters (3)

datafusion.execution.parquet.maximum_parallel_row_group_writersrecommended: 1

Default 1 for min memory; increase for idle cores when writing large files

datafusion.execution.parquet.maximum_buffered_record_batches_per_streamrecommended: 2

Default 2 for min memory; increase with row group writers for better throughput

target_batch_sizerecommended: 8192

Default for CoalesceBatchesExec, may need tuning based on cardinality

CLI Commands (2)

EXPLAIN ANALYZE <query>diagnostic

c.evaluate(batch)?.into_array(batch.num_rows())diagnostic

Technical References (40)

deadlockconceptNestedLoopJoinExeccomponentPR #16996componentTokiocomponentmorsel-driven parallelismconceptAggregateMode::PartialcomponentAggregateMode::FinalcomponentRepartitionExeccomponenthash value reuseconceptSinglePartitionedcomponentarrow::RowcomponentRowConvertercomponentCoalesceBatchesExeccomponentintern()componentdynamic partitioningconceptskipped_aggregation_rows metriccomponentGroupedHashAggregateStreamcomponentarrow_row::variable::encodecomponenthash seedconceptClickBenchcomponentbucket distributionconceptcache localityconceptUTF-8 boundary checksconceptASCII fast pathconceptdate_trunccomponenttimezone offsetconceptquadratic complexityconcepthash collisionconceptFilterExeccomponenttarget_partitionsconfiguration parameterHashJoinExeccomponenthash maskingconceptupdate_hashcomponentcollect_left_inputcomponentUnionArraycomponentbuild_row_join_batchcomponentScalarValue::to_array_of_sizecomponentLarge Union TypecomponentEBVcomponentcoalescecomponent

Related Insights (21)

Blocking memory allocation causes deadlock risk when waiting for memorywarning

▸

Nested loop join with tiny left input and massive right input causes CPU saturation without progresscritical

▸

Undefined pipeline success rate and duration thresholds delay detection of data issueswarning

▸

Writing large parquet files with default parallelism settings underutilizes available coresinfo

▸

Small batch size causes performance degradation through excessive allocationswarning

▸

Tokio async scheduler performs equivalently to custom push-based schedulerinfo

▸

High cardinality aggregations incur triple hashing overhead in multi-phase repartition planswarning

▸

Single-mode aggregation outperforms partial/final for high cardinality by avoiding row conversionswarning

▸

RepartitionExec and CoalesceBatchesExec overhead reduces aggregate performanceinfo

▸

Partial aggregation inefficiency with high cardinality causes performance degradationwarning

▸

RowConverter consumes 75% of aggregation time on high-cardinality group by operationswarning

▸

Hash seed reuse prevents rehashing during aggregation merge phaseinfo

▸

ASCII fast path bypassing improves string function performance up to 5xinfo

▸

Timezone specialization for common cases improves date_trunc by 7xinfo

▸

Same hash seed between HashMaps can cause quadratic complexitywarning

▸

Native DataFusion scan performance optimization opportunities identifiedinfo

▸

CoalesceBatchesExec placement after joins misses optimization opportunitiesinfo

▸

RepartitionExec double hashing causes unnecessary overheadwarning

▸

Duplicate expression evaluation wastes CPU during hash join buildinfo

▸

Nested Loop Join performance degrades 45x with Union Array types in DataFusion 50critical

▸

Complex filter expressions on Union types cause excessive evaluation overhead in Nested Loop Joinswarning

▸