Technologies/Apache DataFusion/datafusion.query.execution_time
Apache DataFusionApache DataFusionMetric

datafusion.query.execution_time

Total query execution time
Dimensions:None
Available on:PrometheusPrometheus (1)OpenTelemetryOpenTelemetry (1)
Interface Metrics (2)
PrometheusPrometheus
Total query execution time in seconds
Dimensions:None
OpenTelemetryOpenTelemetry
Total time spent executing the query physical plan
Dimensions:None

Technical Annotations (54)

Configuration Parameters (10)
datafusion.execution.parquet.pushdown_filtersrecommended: false
default to avoid regressions until performance issues resolved
datafusion.execution.target_partitionsrecommended: 1
set to 1 during benchmarking to reduce variability
datafusion.execution.parquet.binary_as_stringrecommended: true
required for ClickBench data processing
datafusion.optimizer.prefer_hash_joinrecommended: true
Used in the configuration that exposed the regression
datafusion.optimizer.top_down_join_key_reorderingrecommended: true
Controls join key ordering which affects join strategy selection
datafusion.optimizer.filter_null_join_keysrecommended: true
Affects join optimization behavior
datafusion.optimizer.max_passesrecommended: 3
Number of optimizer passes that may affect plan quality
DATAFUSION_OPTIMIZER_REPARTITION_JOINSrecommended: true
Forces optimizer to consider partitioned joins
DATAFUSION_OPTIMIZER_HASH_JOIN_SINGLE_PARTITION_THRESHOLDrecommended: 0
Disables single partition threshold to force partitioned joins
DATAFUSION_OPTIMIZER_HASH_JOIN_SINGLE_PARTITION_THRESHOLD_ROWSrecommended: 0
Disables row count threshold to force partitioned joins
CLI Commands (3)
SELECT * FROM lineitem, orders WHERE l_orderkey = o_orderkey AND o_orderkey = 1 AND l_quantity < (SELECT avg(l_quantity) FROM lineitem WHERE l_orderkey = o_orderkey);diagnostic
cargo run --profile release-nonlto --bin dfbench tpcds --query 99 --iterations 3 --path benchmarks/data/tpcds_sf1 --query_path datafusion/core/tests/tpc-ds --prefer_hash_join truediagnostic
datafusion-cli -c "select sum(l_extendedprice) / 7.0 as avg_yearly from lineitem, part where p_partkey = l_partkey and p_brand = 'Brand#23' and p_container = 'MED BOX' and l_quantity < (select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey);"diagnostic
Technical References (41)
ArrowPredicate APIcomponentlate materializationconceptmetrics collectioncomponentdata batchesconceptarray_hascomponentarray_has_anycomponentbranch-49componentf43df3f2ae3aafb347996c58e852cc378807095bcomponentCrossJoincomponentInner Joincomponentlogical_plancomponentSessionConfigcomponentRuntimeConfigcomponentoptdcomponentcardinality estimationconceptprostcomponentgogo/protobufcomponentprotobuf serializationconceptNestedLoopJoincomponentHashJoincomponentselectivityconceptIMDB benchmarkconceptjoin parameterizationconceptpredicate pushdownconceptTPC-Hconceptexternal joincomponentTreeNode APIcomponentLogicalPlancomponentCollectLeftcomponentPartitionedcomponentHashJoinExeccomponentAggregateExeccomponentFinalcomponentFinalPartitionedcomponentGroupedHashAggregateStreamcomponentconcat_batchescomponentbuild sideconceptprobe sideconceptstar schemaconceptright deep treeconceptEXPLAIN ANALYZEcomponent
Related Insights (19)
Parquet filter pushdown causes query slowdowns for specific query patternswarning
Metrics collection overhead degrades query performance on small batcheswarning
Array membership filter performance degrades linearly with array size in DataFusion 50warning
DataFusion 49 nested loop join underutilizes CPU for array membership queriesinfo
Query optimizer regression causes cross join instead of inner joinwarning
Undefined pipeline success rate and duration thresholds delay detection of data issueswarning
Default parallelism settings limit out-of-box query performancewarning
Join order cardinality estimation failures cause query performance disasterscritical
prost protobuf serialization bottleneck degrades throughput by 40%+warning
Batch splitting in joins may cause performance regressionwarning
Nested loop join batch size fix causes performance regression on certain query patternswarning
Native DataFusion scan performance optimization opportunities identifiedinfo
Join parameterization missing causes full table scans on selective queriescritical
TPC-H queries fail under fuzzed memory limits with external joinscritical
Cardinality estimation errors cause suboptimal plans for queries with 3+ joinswarning
Hash join optimizer selects non-partitioned mode causing 52x slower query executioncritical
Aggregation operations under-partition causing multi-fold performance degradationwarning
concat_batches overhead causes 33x query performance degradationcritical
Suboptimal join order causes 60% query performance degradation on multi-table joinswarning