Apache Spark

Executor Memory Pressure from Oversized Partitions

critical
Resource ContentionUpdated Jan 5, 2026

Spark executors fail with OOM errors when processing partitions significantly larger than 200-500MB, exhausting executor heap memory and causing cascading failures across the cluster.

How to detect:

Monitor partition size distributions and executor memory utilization. Detect when spark_executor_memory_used approaches spark_executor_max_memory while spark_stage_disk_size_spilled increases and spark_executor_completed_tasks decreases with corresponding increases in spark_stage_count_failed_tasks.

Recommended action:

Repartition data to target 200-500MB partitions before memory-intensive operations. Calculate optimal partition count by dividing total dataset size by target partition size. Use repartition early in pipelines and configure maxRecordsPerBatch for streaming workloads to limit partition sizes.