Cross-Platform Metrics Mapping for Migration

infoMigration

Map equivalent metrics and understand observability differences when migrating between EKS, GKE, and AKS.

Prompt: “We're migrating from AWS EKS to Google GKE and need to understand how CloudWatch Container Insights metrics map to Cloud Monitoring — what are the equivalent metrics for CPU throttling, network traffic, and pod health checks?”

Agent Playbook

When an agent encounters this scenario, Schema provides these diagnostic steps automatically.

When migrating from EKS to GKE, start by establishing baseline equivalence for core Kubernetes resource metrics (CPU, memory, network) between CloudWatch Container Insights and Cloud Monitoring, then validate platform-specific metrics like CPU throttling and networking behavior. Pay special attention to networking differences between AWS CNI and GKE VPC-native networking, as these architectural differences can significantly impact pod density and network performance.

1Establish baseline metrics equivalence for core resources

Start by mapping the fundamental Kubernetes resource metrics that both platforms expose. In CloudWatch Container Insights, look for `pod_cpu_utilization`, `pod_memory_utilization`, and their node-level equivalents. These map directly to GKE's `kubernetes.io/container/cpu/core_usage_time` and `kubernetes.io/container/memory/used_bytes` in Cloud Monitoring. Compare `kubernetes_cpu_usage`, `kubernetes_memory_usage`, `kubernetes_cpu_requested`, and `kubernetes_memory_requested` side-by-side during your pilot phase to ensure you're capturing the same workload patterns. This baseline validation ensures your core capacity planning and autoscaling decisions will remain consistent post-migration.

kubernetes_cpu_usagekubernetes_memory_usagekubernetes_cpu_requestedkubernetes_memory_requested

2Map CPU throttling metrics and understand measurement differences

For CPU throttling, CloudWatch Container Insights provides `container_cpu_utilization_over_container_limit` which you'll need to reconstruct in GKE using `kubernetes.io/container/cpu/limit_utilization` and comparing `kubernetes_cpu_usage` against `kubernetes_cpu_limits`. The key difference is that GKE exposes raw cgroup throttling metrics (`cpu.cfs_throttled_seconds_total`) more directly through its metrics, while CloudWatch abstracts this. If you're seeing high `kubernetes_cpu_limits` utilization (>80%) but low node CPU, you likely have containers being throttled unnecessarily—this is more visible in GKE's metrics than in CloudWatch's aggregated view.

kubernetes_cpu_usagekubernetes_cpu_limits

3Map network traffic metrics and validate throughput patterns

CloudWatch Container Insights tracks `pod_network_rx_bytes` and `pod_network_tx_bytes` which map to GKE's `kubernetes.io/pod/network/received_bytes_count` and `kubernetes.io/pod/network/sent_bytes_count`. Monitor `kubernetes_network_rx_size` and `kubernetes_network_transaction_size` during your parallel run to confirm baseline network patterns match. Pay attention to `kubernetes_network_errors`—GKE exposes more granular network error metrics through its VPC flow logs compared to CloudWatch, so you may catch issues that were previously invisible. If errors spike post-migration, it's often related to CNI differences rather than application issues.

kubernetes_network_rx_sizekubernetes_network_transaction_sizekubernetes_network_errors

4Understand and plan for AWS CNI vs GKE VPC-native networking differences

This is critical: AWS CNI uses ENI-based networking with hard pod density limits based on instance type (e.g., 58 pods on c6g.2xlarge without prefix delegation), while GKE uses VPC-native networking with more flexible IP allocation. The insight on `aws-eni-limits-constrain-max-pod-density-on-smaller-instance-types` won't apply in GKE, but you need to ensure your node sizing accounts for the different networking model. If your EKS clusters were hitting pod density limits even with available CPU/memory, GKE will behave differently—you may be able to run more pods per node, which changes your capacity planning and cost model significantly.

AWS ENI Limits Constrain Max Pod Density on Smaller Instance Types

5Map pod health and readiness probe metrics

CloudWatch Container Insights exposes `pod_status_ready` and `pod_status_running` counts, which map to GKE's `kubernetes.io/container/restart_count` and pod phase metrics under `kubernetes.io/pod/status/phase`. The main difference is that GKE's Cloud Monitoring provides more granular liveness and readiness probe failure metrics through its integration with Stackdriver. Cross-reference your current CloudWatch alarms on pod restarts and unavailability with GKE's pod lifecycle metrics during parallel monitoring—you'll likely need to adjust thresholds since GKE surfaces probe failures more explicitly than CloudWatch's aggregated pod status.

6Run parallel monitoring before cutover and validate SLI/SLO parity

Before migrating production traffic, run both EKS and GKE in parallel for at least one full business cycle, collecting all metrics side-by-side. Create dashboards that compare `kubernetes_cpu_usage`, `kubernetes_memory_usage`, `kubernetes_network_errors`, and your key application metrics across both platforms. Your SLIs (latency percentiles, error rates, throughput) should be nearly identical between platforms—if they're not, you've either missed a metrics mapping or uncovered a real platform difference that needs investigation. This parallel run also helps you calibrate alerting thresholds since GKE's metrics often have different collection intervals and aggregation methods than CloudWatch.

kubernetes_cpu_usagekubernetes_memory_usagekubernetes_network_errors