High Latency from Slow API Server or Scheduler

warning

latencyUpdated Feb 23, 2026

Elevated apiserver_request_duration_seconds and apiserver_request_total errors indicate API server overload or scheduler bottlenecks, causing slow pod scheduling, kubectl timeouts, and degraded cluster responsiveness.

Sources

Observability for GKE | Google Kubernetes Engine (GKE)docs.cloud.google.com

Technologies:

Google GKESymptoms of this issue are visible in Google GKE metrics and logs

KubernetesThe root cause of this issue originates in Kubernetes

How to detect:

Monitor apiserver_request_duration_seconds for increasing p95/p99 latency and apiserver_request_total for elevated error rates. Correlate with scheduler metrics to identify whether scheduling latency is contributing to overall control plane delays.

Recommended action:

Review control plane observability metrics in the Google Cloud Console under the cluster's Observability tab. If API server is overloaded, consider optimizing client request patterns, reducing watch/list operations, or scaling control plane resources (for larger clusters). If scheduler is slow, investigate pod affinity rules and node selection constraints.