Apache Spark

Inefficient Caching Causing Memory Eviction

warning
configurationUpdated Jan 5, 2026

Indiscriminate caching of large datasets competes with execution memory, causing high cache eviction rates and forcing re-computation of expensive operations.

How to detect:

Monitor spark_executor_mem_used_on_heap_storage and spark_executor_mem_count_on_heap_storage approaching limits. High eviction rates indicate insufficient memory for cache workload. Cross-reference spark_rdd_memory_used with spark_executor_memory_used to identify cache pressure.

Recommended action:

Cache only datasets accessed multiple times within a job and under 1GB for lookup tables. Explicitly unpersist cached dataframes when no longer needed. Use MEMORY_AND_DISK storage level to spill to disk when memory fills. Monitor Spark UI Storage tab for eviction rates and reduce cached datasets if evictions are high.