JVM stack traces too long to identify failure location

info

availabilityUpdated Feb 22, 2024(via Exa)

Sources

Data Engineering Best Practices - #2. Metadata & Loggingwww.startdataengineering.com

Technologies:

Apache DataFusionsubject

Apache SparkSymptoms of this issue are visible in Apache Spark metrics and logs

How to detect:

When Spark or other JVM-based systems fail, the stack traces are extremely long and difficult to decipher, making it hard to quickly identify the actual failure point without additional context from logs.

Recommended action:

Add log statements immediately before executing critical code sections, especially in JVM/Spark applications. Log what the code is about to do (e.g., 'About to create datasets') before execution. This ensures that if code fails, the last log statement pinpoints the failure location without needing to parse long stack traces. Always print the entire stack trace when errors occur for complete debugging information.