Long GC pauses cause HTTP 408 task timeout failures
criticalG1 Old Generation garbage collection pauses become significantly longer during large query execution, causing HTTP 408 timeout errors on task status endpoints. The timeout delay can be 349ms or more after scheduled time, resulting in query failure. Queries that fail when run whole succeed when split into smaller parts (50% data each).
Monitor GC pause times during query execution. If G1 Old Generation pauses are causing 408 timeouts: (1) Review memory allocation - ensure heap size (-Xmx) provides adequate headroom beyond query.max-memory-per-node + memory.heap-headroom-per-node, (2) Consider tuning G1 GC parameters like G1HeapRegionSize based on workload, (3) Split large queries into smaller batches to reduce memory pressure, (4) Review and potentially increase HTTP timeout thresholds for task status endpoints.