Gunicorn workers enter infinite timeout-SIGKILL cycle on Google App Engine

critical

availabilityUpdated Apr 8, 2024(via Exa)

Sources

Gunicorn workers on Google App Engine randomly sending SIGKILL leading to TIMEOUT. What is going on?stackoverflow.com

Technologies:

Gunicornsubject

NGINXSymptoms of this issue are visible in NGINX metrics and logs

Google Cloud RunGoogle Cloud Run metrics correlate with this issue and help confirm diagnosis

How to detect:

Gunicorn workers enter a repetitive failure cycle: worker boots, receives request, times out after 60 seconds, receives SIGKILL, and is replaced by a new worker that repeats the pattern. Cycle persists for 10 seconds to over an hour, causing all requests to fail with 502 errors. Occurs randomly on new and existing instances, affects any endpoint type (POST/GET), and eventually self-resolves. Does not occur when running locally.

Recommended action:

Migrate to Google Cloud Run using a custom Dockerfile with preload_app=True in Gunicorn config to eliminate platform containerization incompatibility. Verify actual memory usage via platform monitoring—do not assume OOM based on SIGKILL message alone. Monitor gunicorn.request.duration and gunicorn.log.critical for timeout patterns. Test if identical requests succeed when instance is not in timeout loop.