GKE Autopilot cost optimization: you pay for requests, not usage

Quick answer

GKE Autopilot bills for pod resource requests (~$0.0445/vCPU-hour + memory), not nodes — so you stop paying for empty node headroom. The catch: you pay for what pods request, not what they use, so oversized requests are the main overspend. Right-size requests, use Spot Pods for fault-tolerant work, and pick the right compute class.

Autopilot changes the GKE cost model from "pay for nodes" to "pay for pod requests," which eliminates the biggest Standard-mode waste — half-empty nodes — but introduces a new one: requests that don't match reality. Optimizing Autopilot is almost entirely about getting pod resource requests right.

You pay for requests, not usage

Autopilot bills the CPU, memory, and ephemeral storage each pod requests in its spec, while it runs. A pod that requests 2 vCPU and 4 GiB but uses 0.3 vCPU and 500 MiB still bills for 2 and 4. This is the inverse of the Standard-GKE problem: there's no empty-node waste, but inflated requests translate straight to cost.

resources:
  requests:
    cpu: "250m"      # right-sized to actual usage
    memory: "512Mi"  # not "2Gi just in case"

The optimization order

Right-size requests. Use Vertical Pod Autoscaler recommendations or observed usage to set requests close to real consumption with modest headroom. The biggest single lever.
Use Spot Pods for fault-tolerant workloads — a deep discount on the same request-based billing, the Autopilot analog of GCP Spot VMs.
Pick the right compute class. Balanced, scale-out, and other classes offer different price-performance; match the class to whether pods are CPU- or throughput-bound.
Set HPA sensibly so replica count tracks demand rather than sitting at a high fixed number.

Autopilot vs Standard, cost-wise

Autopilot wins when workloads are uneven or hard to bin-pack — you stop paying for the gaps. Standard wins when you can pack nodes near 100% and apply committed-use discounts. The full comparison is in GKE Standard vs Autopilot and Cloud Run vs GKE; this page is about squeezing Autopilot once you've chosen it.

FAQ

How is GKE Autopilot priced?

By the CPU, memory, and ephemeral storage your pods request (not the nodes), at roughly $0.0445/vCPU-hour and $0.0049/GiB-hour, plus the cluster fee. You pay for pod resource requests while pods run, so there's no charge for empty node headroom — the waste that drives Standard GKE cost.

Why is my Autopilot bill higher than expected?

Almost always oversized pod resource requests. Because Autopilot bills for what pods request (not what they use), a deployment asking for 2 vCPU and 4 GiB when it uses 0.3 vCPU pays for the full request. Right-sizing requests to actual usage is the single biggest Autopilot saving.

Is Autopilot cheaper than Standard GKE?

For uneven or hard-to-bin-pack workloads, often yes — Autopilot removes the empty-node headroom you pay for on Standard. For dense, steady workloads that pack Standard nodes near 100%, Standard with committed-use discounts can be cheaper. Autopilot trades some unit price for eliminating waste.

Can I use Spot pods on Autopilot?

Yes. Autopilot supports Spot Pods at a deep discount for fault-tolerant workloads, and balanced/scale-out compute classes for different price-performance profiles. Spot Pods are the biggest lever after right-sizing requests.

Does Autopilot charge for system pods?

Autopilot includes the resources for GKE-managed system components, so you're billed primarily for your workload pods' requests. This is simpler than Standard, where system DaemonSets consume node capacity you've already paid for.

How does C3X estimate GKE cost?

C3X prices a google_container_cluster and its node pools from configuration. For Autopilot, model pod resource requests as the cost driver; for Standard, it's node count and machine type — so you can compare the two modes for your workload.

What to do next

Whether Autopilot or Standard is cheaper depends on your workload shape, and both are decidable from your config. C3X prices a google_container_cluster and google_container_node_pool so you can compare modes before committing. The quickstart runs it in minutes.