Google CloudCloud RunServerless

`google_cloud_run_service` cost estimation

A serverless container service. Billed by request count and per-request CPU/memory time. Generous free tier.

A google_cloud_run_service (or its newer counterpart google_cloud_run_v2_service) runs a containerized application that scales from zero to many instances based on traffic. Pricing is purely usage-based and has three dimensions.

First, requests. Each HTTP request (or event) counts as one request, billed at roughly $0.40 per million. The first 2 million requests per month are free at the project level.

Second, vCPU-time. Cloud Run bills for vCPU usage during request handling, rounded to the nearest 100ms. The default is "Request-based" billing where you pay vCPU only while requests are being handled. The newer "Instance-based" billing (always-on CPU) charges continuously regardless of traffic. Request-based is right for spiky workloads; instance-based is right for steady high-traffic services where the cold-start cost outweighs the always-on charge.

Third, memory-time. Same model as vCPU: GB-seconds during request handling (request-based) or always (instance-based).

CPU is allocated per request: you can request 1 to 8 vCPU per instance and 128 MB to 32 GB of memory. The per-second rate scales with both. Setting cpu_idle = true on Cloud Run v2 reduces CPU usage to near-zero between requests, saving money on spiky workloads.

Cloud Run Jobs (google_cloud_run_v2_job) bill the same way but for batch executions instead of HTTP requests. Different resource, similar model.

Free tier: 2M requests, 360,000 vCPU-seconds, and 180,000 GB-seconds per month at the project level.

c3x estimates Cloud Run from c3x-usage.yml. Set monthly_requests and average_request_duration_ms on the service.

Terraform example

A minimal but realistic configuration that C3X can estimate.

resource "google_cloud_run_v2_service" "api" {
  name     = "api"
  location = "us-central1"

  template {
    containers {
      image = "us-central1-docker.pkg.dev/my-project/api/server:v1.2.3"

      resources {
        limits = {
          cpu    = "1"
          memory = "512Mi"
        }
        cpu_idle = true
      }
    }

    scaling {
      min_instance_count = 0
      max_instance_count = 100
    }
  }

  traffic {
    type    = "TRAFFIC_TARGET_ALLOCATION_TYPE_LATEST"
    percent = 100
  }
}

Pricing dimensions

What you actually pay for when you provision google_cloud_run_service.

Dimension	Unit	What's being charged
Requests	per 1M requests	Each HTTP request or event. First 2M/month free at the project level. $0.40/1M requests
vCPU time (Request-based)	per vCPU-second	Billed only while requests are being handled. Default mode. First 360,000 vCPU-seconds/month free. $0.000024/vCPU-second
Memory time (Request-based)	per GB-second	Billed only while requests are being handled. First 180,000 GB-seconds/month free. $0.0000025/GB-second
vCPU/Memory (Instance-based, always-on)	per vCPU-second + per GB-second	Higher rates than request-based but billed continuously regardless of traffic. Right for high-RPS services.
Egress to internet	per GB	Standard GCP egress rates apply.

Optimization tips

Common ways to reduce google_cloud_run_service cost without changing the workload.

Use CPU idle on v2 services

Up to 70% on idle vCPU

cpu_idle = true sets CPU to near-zero between requests, charging only during request handling. Cuts vCPU costs substantially for low-traffic services. Default in Cloud Run v2.

Right-size memory

Workload-dependent

Cloud Run bills per GB-second. A service requesting 512 MB but actually using 200 MB pays for 2.5x what it needs. Profile memory usage and drop to the next allocation tier (128, 256, 512 MB).

Use Instance-based billing for high-RPS services

Volume-dependent

If a service handles >10 requests/second sustained, instance-based billing's flat rate becomes cheaper than the per-request-time accumulation. Crossover depends on request duration.

Set min_instance_count = 0 for variable traffic

Pay nothing during idle

Allowing scale-to-zero costs nothing when there's no traffic. Cold starts can be 200-500ms. For latency-sensitive services, set min_instance_count = 1 (instance-based billing applies).

FAQ

Is Cloud Run cheaper than running on GCE?

For low-traffic or spiky services, almost always yes. Cloud Run scales to zero between requests; a GCE VM bills continuously. For high-traffic services running 24/7, GCE with committed-use discounts can be cheaper.

What's the difference between v1 and v2?

Cloud Run v2 (google_cloud_run_v2_service) adds features like cpu_idle, better scaling controls, and a cleaner API. Same pricing model. New deployments should use v2.

How does c3x estimate request volume?

Add monthly_requests and average_request_duration_ms on the service in c3x-usage.yml. c3x then computes vCPU-seconds and GB-seconds based on the container's resource limits.

What about Cloud Run Jobs?

Cloud Run Jobs (google_cloud_run_v2_job) run to completion rather than serving requests. Same pricing model based on vCPU-time and memory-time. c3x estimates Jobs separately with execution count and duration in the usage file.