Cross-Platform Metrics Mapping for Migration
infoMigration
Map equivalent metrics and understand observability differences when migrating between EKS, GKE, and AKS.
Prompt: “We're migrating from AWS EKS to Google GKE and need to understand how CloudWatch Container Insights metrics map to Cloud Monitoring — what are the equivalent metrics for CPU throttling, network traffic, and pod health checks?”
Agent Playbook
When an agent encounters this scenario, Schema provides these diagnostic steps automatically.
When migrating from EKS to GKE, start by establishing baseline equivalence for core Kubernetes resource metrics (CPU, memory, network) between CloudWatch Container Insights and Cloud Monitoring, then validate platform-specific metrics like CPU throttling and networking behavior. Pay special attention to networking differences between AWS CNI and GKE VPC-native networking, as these architectural differences can significantly impact pod density and network performance.
1Establish baseline metrics equivalence for core resources
Start by mapping the fundamental Kubernetes resource metrics that both platforms expose. In CloudWatch Container Insights, look for `pod_cpu_utilization`, `pod_memory_utilization`, and their node-level equivalents. These map directly to GKE's `kubernetes.io/container/cpu/core_usage_time` and `kubernetes.io/container/memory/used_bytes` in Cloud Monitoring. Compare `kubernetes_cpu_usage`, `kubernetes_memory_usage`, `kubernetes_cpu_requested`, and `kubernetes_memory_requested` side-by-side during your pilot phase to ensure you're capturing the same workload patterns. This baseline validation ensures your core capacity planning and autoscaling decisions will remain consistent post-migration.
2Map CPU throttling metrics and understand measurement differences
For CPU throttling, CloudWatch Container Insights provides `container_cpu_utilization_over_container_limit` which you'll need to reconstruct in GKE using `kubernetes.io/container/cpu/limit_utilization` and comparing `kubernetes_cpu_usage` against `kubernetes_cpu_limits`. The key difference is that GKE exposes raw cgroup throttling metrics (`cpu.cfs_throttled_seconds_total`) more directly through its metrics, while CloudWatch abstracts this. If you're seeing high `kubernetes_cpu_limits` utilization (>80%) but low node CPU, you likely have containers being throttled unnecessarily—this is more visible in GKE's metrics than in CloudWatch's aggregated view.
3Map network traffic metrics and validate throughput patterns
CloudWatch Container Insights tracks `pod_network_rx_bytes` and `pod_network_tx_bytes` which map to GKE's `kubernetes.io/pod/network/received_bytes_count` and `kubernetes.io/pod/network/sent_bytes_count`. Monitor `kubernetes_network_rx_size` and `kubernetes_network_transaction_size` during your parallel run to confirm baseline network patterns match. Pay attention to `kubernetes_network_errors`—GKE exposes more granular network error metrics through its VPC flow logs compared to CloudWatch, so you may catch issues that were previously invisible. If errors spike post-migration, it's often related to CNI differences rather than application issues.
4Understand and plan for AWS CNI vs GKE VPC-native networking differences
This is critical: AWS CNI uses ENI-based networking with hard pod density limits based on instance type (e.g., 58 pods on c6g.2xlarge without prefix delegation), while GKE uses VPC-native networking with more flexible IP allocation. The insight on `aws-eni-limits-constrain-max-pod-density-on-smaller-instance-types` won't apply in GKE, but you need to ensure your node sizing accounts for the different networking model. If your EKS clusters were hitting pod density limits even with available CPU/memory, GKE will behave differently—you may be able to run more pods per node, which changes your capacity planning and cost model significantly.
5Map pod health and readiness probe metrics
CloudWatch Container Insights exposes `pod_status_ready` and `pod_status_running` counts, which map to GKE's `kubernetes.io/container/restart_count` and pod phase metrics under `kubernetes.io/pod/status/phase`. The main difference is that GKE's Cloud Monitoring provides more granular liveness and readiness probe failure metrics through its integration with Stackdriver. Cross-reference your current CloudWatch alarms on pod restarts and unavailability with GKE's pod lifecycle metrics during parallel monitoring—you'll likely need to adjust thresholds since GKE surfaces probe failures more explicitly than CloudWatch's aggregated pod status.
6Run parallel monitoring before cutover and validate SLI/SLO parity
Before migrating production traffic, run both EKS and GKE in parallel for at least one full business cycle, collecting all metrics side-by-side. Create dashboards that compare `kubernetes_cpu_usage`, `kubernetes_memory_usage`, `kubernetes_network_errors`, and your key application metrics across both platforms. Your SLIs (latency percentiles, error rates, throughput) should be nearly identical between platforms—if they're not, you've either missed a metrics mapping or uncovered a real platform difference that needs investigation. This parallel run also helps you calibrate alerting thresholds since GKE's metrics often have different collection intervals and aggregation methods than CloudWatch.
Technologies
Related Insights
AWS ENI Limits Constrain Max Pod Density on Smaller Instance Types
warning
AWS CNI enforces low max pod counts (e.g., 58 for c6g.2xlarge) on instances with fewer CPUs due to ENI limitations, preventing full utilization of node compute capacity unless prefix delegation is enabled.
Memory pressure triggers random pod eviction in AKS clusters
critical
Azure AKS network connectivity issues cause daemon API call hangs
critical