Technologies/Apache DataFusion/datafusion.parquet.bytes_scanned
Apache DataFusionApache DataFusionMetric

datafusion.parquet.bytes_scanned

Bytes read from Parquet
Dimensions:None
Available on:Native (1)OpenTelemetryOpenTelemetry (1)
Interface Metrics (2)
Native
Total bytes scanned from data sources
Dimensions:None
OpenTelemetryOpenTelemetry
Number of bytes read from Parquet files during scanning
Dimensions:None

Technical Annotations (12)

Configuration Parameters (3)
datafusion.execution.parquet.pushdown_filtersrecommended: true
Enables predicate pushdown to parquet; issue occurs when this is enabled with parquet 56.1.0
max_predicate_cache_sizerecommended: 0
Setting to 0 disables the predicate cache introduced in parquet 56.1.0 to avoid regression
with_parquet_pruningrecommended: True
Enables row group and page-level pruning for Parquet files
CLI Commands (2)
ctx.sql("set datafusion.execution.parquet.pushdown_filters = true").await?.collect().await?diagnostic
ctx.sql("explain analyze select k from t where k = 123456").await?.show().await?diagnostic
Technical References (7)
predicate cachecomponentdata pagecomponentparquet/src/arrow/arrow_reader/selection.rsfile pathpage index pruningconceptParquetcomponentrow group pruningconceptprojection pushdownconcept
Related Insights (5)
Parquet 56.1.0 predicate cache fetches excessive data pages for small page sizeswarning
Predicate cache toggle does not always prevent performance regressionswarning
Disabled Parquet filter pushdown reduces query performancewarning
Native DataFusion scan performance optimization opportunities identifiedinfo
Inefficient Parquet row group pruning when projection pushdown is not appliedwarning