Google CloudBigQueryDatabase

`google_bigquery_dataset` cost estimation

A serverless data warehouse dataset. Two pricing models: on-demand ($6.25/TB scanned) or capacity-based (slot-hours).

A google_bigquery_dataset is a container for tables in BigQuery, GCP's serverless data warehouse. Pricing has three components: storage, query compute, and (sometimes) streaming inserts.

Storage:

Active storage: $0.02/GB-month for tables modified within the last 90 days.

Long-term storage: $0.01/GB-month for tables unchanged for 90+ days. Automatic transition, no action required.

Physical (compressed) storage: optional, costs less per GB but is billed for the compressed representation. Right for repository-style datasets.

Query compute (two models):

On-demand: $6.25 per TB of data scanned. Each query bills based on the bytes the query reads from disk (after partition/clustering pruning, before result generation). The first 1 TB scanned per month is free.

Capacity (BigQuery Editions): you reserve slots (a slot = roughly one CPU). Three editions: - Standard: $0.04/slot-hour. Right for small workloads. - Enterprise: $0.06/slot-hour with more features. Right for medium workloads. - Enterprise Plus: $0.10/slot-hour with the most features. Right for large workloads.

Slot reservations (with commitments) drop prices substantially: 1-year commitment is ~33% off, 3-year is ~63% off.

Streaming inserts (deprecated in favor of Storage Write API): $0.012/200 MB inserted. Storage Write API is cheaper.

c3x estimates BigQuery primarily from usage data. Define monthly_queries_tb and average_active_storage_gb in c3x-usage.yml on the dataset.

Terraform example

A minimal but realistic configuration that C3X can estimate.

resource "google_bigquery_dataset" "analytics" {
  dataset_id    = "analytics"
  friendly_name = "Production Analytics"
  location      = "US"

  default_table_expiration_ms = 31536000000  # 1 year

  labels = {
    environment = "production"
  }
}

resource "google_bigquery_table" "events" {
  dataset_id = google_bigquery_dataset.analytics.dataset_id
  table_id   = "events"

  time_partitioning {
    type  = "DAY"
    field = "event_timestamp"
  }

  clustering = ["user_id", "event_type"]

  schema = jsonencode([
    {
      name = "event_timestamp"
      type = "TIMESTAMP"
    },
    {
      name = "user_id"
      type = "STRING"
    },
    {
      name = "event_type"
      type = "STRING"
    },
    {
      name = "payload"
      type = "JSON"
    }
  ])
}

Pricing dimensions

What you actually pay for when you provision google_bigquery_dataset.

Dimension	Unit	What's being charged
Active storage	per GB-month	Tables modified within last 90 days. $0.02/GB-month
Long-term storage	per GB-month	Tables unchanged for 90+ days. Automatic transition. $0.01/GB-month
Physical storage	per GB-month	Optional billing model for compressed storage. Often cheaper for compressible data.
On-demand queries	per TB scanned	Bytes scanned after partition pruning. First 1 TB/month free. $6.25/TB
Capacity slots	per slot-hour	Reserved compute capacity. Three editions: Standard, Enterprise, Enterprise Plus. $0.04-$0.10/slot-hour
Streaming inserts (legacy)	per 200 MB	Deprecated. Storage Write API is cheaper and more efficient.

Optimization tips

Common ways to reduce google_bigquery_dataset cost without changing the workload.

Partition and cluster large tables

Major on on-demand queries

Partitioning by date and clustering by frequently-filtered columns can cut bytes scanned by 90% or more. Queries that filter on the partition column only scan matching partitions.

Switch from streaming to Storage Write API

50% on ingest

Streaming inserts are $0.012/200 MB. Storage Write API is 50% cheaper. For high-volume ingest, this saves real money.

Move to capacity-based pricing for predictable workloads

30-60% on high-volume workloads

If your monthly query spend on on-demand exceeds ~$2,000/month, capacity-based with reservations is cheaper. Especially with 1- or 3-year commitments.

Set default table expiration

Storage over time

Setting default_table_expiration_ms on the dataset (e.g., 365 days) automatically deletes tables that don't need to live forever. Prevents storage accumulation.

Use approximate aggregation functions

Query-specific

APPROX_COUNT_DISTINCT and similar functions scan less data than exact counterparts. For analytics queries that tolerate 1-2% error, this can cut scan costs significantly.

FAQ

On-demand or capacity pricing?

On-demand: simple, no commitment, right for unpredictable workloads. Capacity: cheaper at high volume, requires commitment, right when you can predict slot usage. Crossover is typically around $2,000/month in on-demand spend.

How does c3x estimate BigQuery cost?

Storage is computed from declared tables and expected size. Queries are usage-based: add monthly_queries_tb to c3x-usage.yml on the dataset. For capacity model, specify reserved_slots and slot_commitment_term.

What's the difference between active and long-term storage?

Tables not modified for 90 days automatically transition to long-term storage at half the per-GB rate. Any modification (insert, update, delete) resets the 90-day clock. The transition is automatic and bidirectional.

Are streaming inserts still recommended?

No. Legacy streaming inserts are deprecated. The Storage Write API replaces them with better cost (50% cheaper), better throughput, and exactly-once semantics.

Related resources

Google Cloud

google_storage_bucket

Common source data for BigQuery loads

Google Cloud

google_pubsub_topic

Streaming source via Dataflow into BigQuery

Estimate this resource in your own Terraform

Free, open source, no API key. C3X parses your Terraform and shows line-item cost for every resource, including google_bigquery_dataset.

Get Started Star on GitHub

← Browse all Google Cloud resources