Moving from AWS to GCP is rarely a technical improvement story. It is a business decision with technical implications — a commitment to BigQuery for analytics, a Vertex AI platform deal, a contract renegotiation, or an acquisition. The engineering team that executes the migration will encounter a stacking set of problems: service translation, identity redesign, VPC topology changes, data gravity, and a window where costs double before they improve. This post is the version written for people who have to do the work, not the one that sells the idea.
When not to move
This is the most important section and the one most migration guides skip. If there is no specific, named capability that AWS cannot provide or prices significantly worse, a migration is a large engineering bill for a lateral move.
The clearest valid drivers for moving to GCP:
BigQuery or Vertex AI as the primary pull. If your workload is analytics-heavy or you are building on foundation model APIs, GCP's data and AI stack is genuinely differentiated. BigQuery's serverless pricing — per-byte scanned, or flat-rate slot reservations — often undercuts Redshift and Athena at scale, and Vertex AI provides tighter integration with first-party models than anything equivalent on AWS.
Compute pricing on sustained, non-bursty workloads. GCP's sustained use discounts apply automatically: an instance running the full month gets roughly a 30% discount with no reservation commitment. AWS's equivalent is a 1-year Standard Reserved Instance (around 40% discount, but committed upfront) or a Savings Plan. For organizations without sophisticated reservation management, GCP's automatic model is operationally simpler. For organizations that already optimize AWS commitments well, the delta shrinks.
A contractual or acquisition reason. If your company signed a committed-use deal with Google or was acquired by a GCP-native entity, the decision is already made.
Do not migrate for any of these reasons:
- "GCP has a better UI" — UI preferences are not worth a 3–6 month engineering pause
- "Costs are lower" — without a TCO analysis scoped to your specific SKUs and regions, this is unknowable; cross-cloud comparisons are workload-specific and vary by 40–60% depending on instance type
- "We want multi-cloud for resilience" — if resilience is the actual goal, active-active multi-cloud is a different and much harder problem from what a migration solves; migrating to GCP gives you one cloud, not two
The service-mapping table
Most of the migration is a translation problem. The closer each AWS service maps to a GCP equivalent, the less application-layer work is required. The table below captures the behavioral delta that actually matters during a migration.
| AWS service | GCP equivalent | Behavioral delta | |---|---|---| | EC2 | Compute Engine (GCE) | GCE uses custom machine types; AWS has fixed instance families | | EKS | GKE | GKE Autopilot mode; Workload Identity is native, not an add-on | | ECS Fargate | Cloud Run or GKE Autopilot | Cloud Run for stateless HTTP; Autopilot for full Kubernetes shape | | Lambda | Cloud Functions / Cloud Run | Cloud Run scales to zero with lower cold-start penalty for HTTP | | S3 | Cloud Storage (GCS) | Near-identical object API; lifecycle policy syntax differs | | RDS (MySQL/PG) | Cloud SQL | Managed equivalents; GCP's DMS migration path is production-ready | | DynamoDB | Firestore or Bigtable | No drop-in; data model redesign required | | Redshift | BigQuery | Radically different billing and execution engine; non-trivial migration | | ElastiCache Redis | Memorystore for Redis | Near-identical; minor configuration differences | | SQS / SNS | Pub/Sub | Pub/Sub has no FIFO guarantee by default | | Route 53 | Cloud DNS | Cloud DNS zones are global, not per-region | | CloudFront | Cloud CDN | Similar capabilities; GCP pricing tiers affect routing decisions | | VPC + Security Groups | VPC + Firewall Rules | GCP VPC is global; see networking section below | | ALB / NLB | Cloud Load Balancing | GCP offers Premium vs Standard network tiers | | IAM | IAM + Resource Hierarchy | Fundamentally different model; see IAM section below | | CloudWatch | Cloud Monitoring + Cloud Logging | GCP splits this across more products | | CloudFormation | Terraform or Deployment Manager | Most teams should use Terraform for both |
The tier-1 services — object storage, managed Redis, managed PostgreSQL, Kubernetes — are the easiest. The tier-2 services — DynamoDB equivalents, Redshift to BigQuery, Lambda to Cloud Functions — require real application-layer work and should be treated as refactors, not migrations. Scope them separately.
Identity is not portable: redesigning IAM for GCP
IAM is where most migrations underestimate complexity. The operational models are different enough that "porting" IAM is the wrong mental model. This is a redesign.
AWS IAM model. AWS organizes around accounts. Permissions attach to IAM users, roles, or groups. Cross-account access uses STS AssumeRole. A multi-account organization uses AWS Organizations to propagate service control policies. Service-to-service access is handled through IAM roles attached to EC2 instances, Lambda functions, or ECS tasks.
GCP IAM model. GCP organizes around a resource hierarchy: organization, then folders, then projects, then resources. IAM bindings can be placed at any level and are inherited downward. There are no separate accounts — a project is the billing and isolation unit. Permissions are expressed as roles bound to principals (users, groups, or service accounts) at a specific resource or level.
The practical migration implications:
Workload Identity replaces long-lived key files. On AWS, an EC2 instance assumes an IAM role and credentials rotate automatically via the metadata service. The GCP equivalent for GKE is Workload Identity: a Kubernetes service account is bound to a GCP service account, and pods receive short-lived tokens via the metadata server. Service account JSON key files — technically valid, practically dangerous — should never be used for in-cluster workloads.
# Bind a Kubernetes SA to a GCP SA via Workload Identity
# GCP side: allow the k8s SA to impersonate the GCP SA
gcloud iam service-accounts add-iam-policy-binding \
my-app@my-project.iam.gserviceaccount.com \
--role="roles/iam.workloadIdentityUser" \
--member="serviceAccount:my-project.svc.id.goog[my-namespace/my-ksa]"
# Kubernetes side: annotate the KSA
kubectl annotate serviceaccount my-ksa \
--namespace my-namespace \
iam.gke.io/gcp-service-account=my-app@my-project.iam.gserviceaccount.comPredefined roles before custom roles. GCP ships predefined roles at fine granularity — roles/storage.objectViewer, roles/cloudsql.client, roles/pubsub.subscriber. The migration instinct is to reach for roles/editor for convenience, or to translate AWS policies into custom GCP roles wholesale. This reproduces your existing permission sprawl. Audit your AWS IAM policies before migration, start with the tightest predefined role that covers the access pattern, and only create custom roles when no predefined role fits.
Enable audit logs at organization level on day one. GCP's Data Access audit logs are disabled by default for most services. Enabling them retroactively after an incident is a bad conversation. Wire them into Cloud Logging with log sinks to BigQuery or a SIEM on the first day the organization is created.
Networking: GCP's global VPC changes the architecture
AWS VPCs are per-region. You create a VPC in us-east-1 and a separate VPC in eu-west-1; subnets are per availability zone. Cross-region communication requires VPC peering, Transit Gateway, or the public internet.
GCP's VPC is global by default. A single VPC spans every region, and VM-to-VM communication within that VPC crosses regions over Google's private backbone — no peering or Transit Gateway equivalent required. This is architecturally cleaner for multi-region deployments, but it means a 1:1 copy of your AWS VPC layout is the wrong starting point.
Key translation points:
- Security Groups become firewall rules. GCP firewall rules apply based on network tags or service account identity, not per-instance security groups. Flattening a sprawling security group model into tag-based rules is necessary work that has to happen before workload migration.
- Shared VPC for multi-project environments. GCP's Shared VPC lets you maintain a single network while isolating billing and IAM into separate projects. This is the equivalent of AWS Transit Gateway plus multi-account Organizations architecture, but simpler to configure.
- Network Premium vs Standard tier. GCP routes traffic differently by tier. Premium routes over Google's backbone; Standard routes over the public internet. Premium is default and recommended for latency-sensitive workloads; Standard is appropriate for batch or dev traffic where egress cost matters more than latency.
On egress: AWS standard internet egress runs approximately $0.09/GB after the first 100 GB free monthly. GCP internet egress is approximately $0.085/GB in most regions. The per-GB delta is narrow. The more meaningful difference is inter-region: GCP's inter-region transfer pricing is roughly 50% lower than AWS for equivalent routes, which compounds quickly in multi-region architectures (source: lowcloud.io egress fee comparison, 2025).
Both AWS and GCP have since announced egress fee waivers for customers actively switching providers. AWS provides credits for data transfer out during migration but requires a 60-day exit period and full workload removal. GCP announced its waiver earlier with fewer conditions. Either way: request the waiver before initiating large data transfers, and model egress as a real line item rather than a rounding error.
Egress charges are never the reason you migrate. They are often the reason the migration runs over budget.
GKE vs EKS: the Kubernetes layer is not free
If workloads are already containerized on EKS, the Kubernetes API itself is portable. Cluster YAML, Helm charts, and application configs need minimal changes. The control-plane pricing, node management model, and add-on ecosystem differ in ways that affect operating cost.
Control-plane cost. Both EKS and GKE charge approximately $0.10/hr per cluster, about $73/month. GKE provides a $74.40/month free-tier credit per billing account, which makes one GKE cluster effectively free. For engineering teams running multiple dev or staging clusters, this adds up (source: Sedai EKS vs GKE cost comparison, 2026).
$73/mo
EKS control plane
per cluster, no free tier
~$0/mo
GKE first cluster
$74.40/mo credit per billing account
30–50%
K8s resource waste
typical over-provisioned clusters
Source: Sedai, EKS vs GKE cost comparison, 2026
GKE Autopilot vs EKS Fargate. Both are serverless Kubernetes modes: you pay per pod resource request, not per node. GKE Autopilot is the more complete offering. It enforces security hardening — read-only root filesystem, no privileged containers — automatically, and handles node provisioning, upgrades, and bin-packing without manual intervention. EKS Fargate provides tighter container isolation (each pod in its own microVM via Firecracker) but has no equivalent of Autopilot's automatic node pool management or built-in Workload Identity.
For most stateless workloads, Autopilot is the better operational choice. For workloads that require privileged containers or specific kernel features, GKE Standard with Workload Identity and node auto-provisioning is the right mode.
IRSA becomes Workload Identity. EKS uses IAM Roles for Service Accounts (IRSA) to provide AWS credentials to pods via annotation. GKE Autopilot enables Workload Identity by default. The mechanism differs (see IAM section), but the migration is mechanical: update SA annotations, update Terraform, validate that pods can reach the GCP services they need.
This waste profile is cloud-agnostic — it exists on EKS and GKE equally. A migration is an opportunity to address it. GKE Autopilot enforces resource requests and limits, which imposes the discipline that permissive EKS clusters often lack. Re-platforming onto Autopilot frequently produces a smaller, cheaper footprint than the EKS cluster it replaced, because the right-sizing happens automatically.
Data has gravity — and an egress bill
Compute is stateless by design. Data is not. Each database, object store, and queue you move requires a migration strategy, a rollback plan, and a cutover window. These need to be scoped separately from the compute migration and planned in order: compute first, then data.
Object storage: S3 to Cloud Storage
GCS and S3 share a conceptually similar object model and most client SDKs abstract the difference. The actual transfer work is the constraint. For datasets under a few TB, gsutil with the -m flag for parallel operations works. For tens of TB or more, use GCP's Storage Transfer Service — it runs on Google's infrastructure, not your machine, and supports bandwidth scheduling to avoid competing with production traffic.
# GCP Storage Transfer Service — managed, server-side, resumable
gcloud transfer jobs create \
--source-aws-bucket=my-source-bucket \
--source-aws-access-key-id=AKIA... \
--source-aws-secret-access-key=... \
--destination-gcs-bucket=my-dest-bucket \
--schedule-repeats-every=1d
# Incremental sync for final delta before cutover (small datasets)
gsutil -m rsync -r -d s3://my-source-bucket gs://my-dest-bucketRun the bulk transfer first, then schedule a final incremental delta immediately before the application cutover to minimize the gap. Verify checksums — GCS stores MD5 and CRC32c; S3 stores ETags (MD5 for non-multipart uploads). gsutil ls -L gs://my-bucket/path shows the hash for each object.
Relational databases: RDS to Cloud SQL
GCP's Database Migration Service supports continuous replication from RDS MySQL and RDS PostgreSQL to Cloud SQL using CDC over binlog (MySQL) and WAL (PostgreSQL). Replication lag is typically under one second for steady-state workloads, enabling a live cutover with minimal downtime.
1. Create Cloud SQL target with matching version, character set, and flags
2. Start DMS migration job — initial full dump, then CDC replication
3. Monitor replication lag for 48h; target under 1 second sustained
4. Update application connection strings (use Secret Manager for credentials)
5. Drain traffic from RDS — confirm zero new writes
6. Promote Cloud SQL instance to primary
7. Decommission RDS after a 1-week observation periodOne operational difference: Cloud SQL handles TLS termination differently from RDS. Use the Cloud SQL Proxy or the Cloud SQL connector library rather than direct TCP connections. The connector handles certificate rotation and IAM-based authentication transparently; direct TCP connections require manual certificate management.
NoSQL and caches
Memorystore for Redis is near-identical to ElastiCache for Redis at the API level. For zero-downtime migration: dual-write to both clusters during the migration window, monitor key coverage on the new cluster, then drain the old one when coverage is stable.
DynamoDB has no direct GCP equivalent. Firestore is the candidate for document workloads; Bigtable for high-throughput wide-column workloads. Neither accepts DynamoDB export format natively. A DynamoDB migration is a schema redesign project, not a transfer — scope it accordingly and do not try to parallelize it with the compute migration.
The migration execution playbook
- 01
Assess and scope
Inventory all AWS services in use. Categorize each as direct-map, requires-refactor, or out-of-scope for this migration. Define rollback triggers and success criteria in writing before any work starts.
- 02
Build the GCP foundation
Establish the organization hierarchy, folder structure, Shared VPC, IAM roles, Terraform state backend in GCS, and CI/CD pipeline wired to GCP. Do this entirely before migrating any workload.
- 03
Migrate stateless compute first
Port Kubernetes workloads to GKE. Re-platform any remaining EC2-based services onto GKE as part of this step — containerize here, not later. Update Workload Identity bindings and validate application behavior.
- 04
Migrate stateful services
Run DMS for databases in CDC mode. Set up Memorystore with dual-write. Transfer object storage using Storage Transfer Service. Validate data integrity and replication lag before scheduling any cutover.
- 05
Cut over DNS and traffic
Lower DNS TTLs to 60 seconds one week before cutover. Route traffic via weighted DNS policies. Execute the cutover. Monitor for 24 hours before decommissioning AWS resources.
- 06
Decommission AWS
Systematically terminate AWS resources in reverse dependency order. Audit cost and usage reports to confirm zero unexpected spend. Close DMS replication jobs and VPN or interconnect tunnels.
Source: ClimsTech Engineering practice
The order matters. Building the GCP foundation first — hierarchy, IAM, VPC, and Terraform backend — means every subsequent workload migration is additive and reviewable rather than improvisational. Teams that skip the foundation and "migrate one service at a time" spend months backfilling governance and debugging permission issues that a proper foundation would have prevented.
Five pitfalls that derail migrations
1. Replicating the AWS VPC topology verbatim
AWS's per-region, per-AZ subnet model does not transfer to GCP. Teams that create one GCP VPC per region to mirror their AWS architecture end up with a fragmented, harder-to-manage network that provides no benefit over what they left.
Fix: Design one Shared VPC per environment — production and staging — with regional subnets sized generously. Use network tags and service-account-based firewall rules instead of instance-level security groups. GCP's global backbone handles cross-region communication within the VPC without additional configuration.
2. Copying IAM policies into GCP custom roles
AWS IAM policies accumulate permissions over years. Lifting them into GCP custom roles without pruning produces the same permission sprawl — or worse, because GCP's additive inheritance means an overly broad binding at the project level affects every resource in the project.
Fix: Audit AWS IAM policies before migration. Identify which GCP predefined roles cover each access pattern. Start with least-privilege predefined roles and only create custom roles when no predefined role fits. Document every custom role in Terraform with a clear description of which service it serves and why the predefined roles were insufficient.
3. Creating service account key files for in-cluster workloads
It is technically possible to create a GCP service account JSON key file and mount it into your pods. It is also the most common credential-theft vector in GCP environments. Key files do not rotate automatically; they get committed to version control; they get copied into container images.
Fix: Use Workload Identity exclusively for GKE workloads. For Compute Engine instances, attach a service account to the instance — no key file needed. For external systems accessing GCP APIs, use Workload Identity Federation with your existing identity provider. To audit your current exposure:
# Find user-managed service account keys older than 90 days
gcloud iam service-accounts list --format="value(email)" | \
while read sa; do
gcloud iam service-accounts keys list \
--iam-account="$sa" \
--managed-by=user \
--format="table(name,validAfterTime)" 2>/dev/null
done4. Ignoring external IPv4 address charges
GCP charges $0.004/hr for every in-use external IPv4 address — static or ephemeral — beginning with the pricing change introduced in 2024. That is approximately $2.92/month per IP. A GKE cluster with 50 nodes carrying external IPs adds roughly $146/month in IP charges alone before any compute cost.
Fix: Audit external IPs before migration with gcloud compute addresses list. In GKE, use private nodes with a Cloud NAT gateway for outbound traffic. Use internal load balancers for services that do not require public internet access. For ingress, design for one external IP per load balancer, not one per service.
5. Underestimating DNS cutover lead time
A DNS TTL of 86,400 seconds — common in legacy configurations — means resolvers cache the old record for up to 24 hours after a cutover. If you flip DNS to GCP and your AWS resources are already decommissioned, you have a 24-hour incident window that could have been avoided.
Fix: One week before the target cutover date, lower all relevant DNS TTLs to 60 seconds. Resolver caches drain within one minute after the change, giving you a clean, fast switchover on the day. After the cutover is confirmed stable, raise TTLs back to 300–3600 seconds for normal operation.
Terraform: the actual portability layer
The goal of a migration should not be "we are on GCP." It should be "we have a reproducible, auditable infrastructure definition that targets GCP." This distinction determines what you get out the other side.
If you rebuild on GCP using the console and undocumented manual steps, you have replicated the problem that made your AWS environment hard to manage. If you rebuild using Terraform with a clean module structure, you have infrastructure that can be reviewed in PRs, tested in CI, and rebuilt from scratch in under an hour.
A minimal but correct GKE cluster module:
resource "google_container_cluster" "primary" {
name = var.cluster_name
location = var.region
remove_default_node_pool = true
initial_node_count = 1
workload_identity_config {
workload_pool = "${var.project_id}.svc.id.goog"
}
private_cluster_config {
enable_private_nodes = true
enable_private_endpoint = false
master_ipv4_cidr_block = var.master_cidr
}
network = var.network
subnetwork = var.subnetwork
}
resource "google_container_node_pool" "primary" {
name = "primary"
cluster = google_container_cluster.primary.name
location = var.region
node_config {
machine_type = var.machine_type
service_account = google_service_account.gke_node.email
oauth_scopes = ["https://www.googleapis.com/auth/cloud-platform"]
workload_metadata_config {
mode = "GKE_METADATA"
}
}
}Terraform state for GCP should live in a GCS bucket with versioning and object lock enabled from day one:
terraform {
backend "gcs" {
bucket = "my-org-terraform-state"
prefix = "gcp/prod"
}
}Store the GCP Terraform workspace separately from your AWS workspace — different directories, different state files, no shared backends. This avoids accidental cross-provider drift and makes decommissioning the AWS side mechanical rather than surgical.
We have run migrations of exactly this shape — EC2 workloads re-platformed onto GKE, MySQL and Redis moved to managed services, the full environment defined in Terraform end to end. The migration effort was real. The outcome was infrastructure that is actually governable: changes reviewed in PRs, environments reproducible from scratch, drift detected in CI.
Lift-and-shift approach
- Copy EC2 instances to GCE with same OS images
- Replicate VPC topology one-per-region
- Translate AWS IAM policies to broad custom roles
- Fill gaps with manual console configuration
- Same tech debt, same operational burden, different cloud bill
Re-platform approach
- Containerize onto GKE Autopilot where possible
- Single global VPC with Shared VPC for project isolation
- Redesign IAM from scratch with least-privilege and Workload Identity
- Everything in Terraform, CI/CD wired from day one
- Cleaner infrastructure than what you left, and actually governable