google_bigquery_dataset cost estimation
A serverless data warehouse dataset. Two pricing models: on-demand ($6.25/TB scanned) or capacity-based (slot-hours).
A google_bigquery_dataset is a container for tables in BigQuery, GCP's serverless data warehouse. Pricing has three components: storage, query compute, and (sometimes) streaming inserts.
Storage:
Active storage: $0.02/GB-month for tables modified within the last 90 days.
Long-term storage: $0.01/GB-month for tables unchanged for 90+ days. Automatic transition, no action required.
Physical (compressed) storage: optional, costs less per GB but is billed for the compressed representation. Right for repository-style datasets.
Query compute (two models):
On-demand: $6.25 per TB of data scanned. Each query bills based on the bytes the query reads from disk (after partition/clustering pruning, before result generation). The first 1 TB scanned per month is free.
Capacity (BigQuery Editions): you reserve slots (a slot = roughly one CPU). Three editions: - Standard: $0.04/slot-hour. Right for small workloads. - Enterprise: $0.06/slot-hour with more features. Right for medium workloads. - Enterprise Plus: $0.10/slot-hour with the most features. Right for large workloads.
Slot reservations (with commitments) drop prices substantially: 1-year commitment is ~33% off, 3-year is ~63% off.
Streaming inserts (deprecated in favor of Storage Write API): $0.012/200 MB inserted. Storage Write API is cheaper.
c3x estimates BigQuery primarily from usage data. Define monthly_queries_tb and average_active_storage_gb in c3x-usage.yml on the dataset.
Terraform example
A minimal but realistic configuration that C3X can estimate.
resource "google_bigquery_dataset" "analytics" {
dataset_id = "analytics"
friendly_name = "Production Analytics"
location = "US"
default_table_expiration_ms = 31536000000 # 1 year
labels = {
environment = "production"
}
}
resource "google_bigquery_table" "events" {
dataset_id = google_bigquery_dataset.analytics.dataset_id
table_id = "events"
time_partitioning {
type = "DAY"
field = "event_timestamp"
}
clustering = ["user_id", "event_type"]
schema = jsonencode([
{
name = "event_timestamp"
type = "TIMESTAMP"
},
{
name = "user_id"
type = "STRING"
},
{
name = "event_type"
type = "STRING"
},
{
name = "payload"
type = "JSON"
}
])
}Pricing dimensions
What you actually pay for when you provision google_bigquery_dataset.
| Dimension | Unit | What's being charged |
|---|---|---|
| Active storage | per GB-month | Tables modified within last 90 days. $0.02/GB-month |
| Long-term storage | per GB-month | Tables unchanged for 90+ days. Automatic transition. $0.01/GB-month |
| Physical storage | per GB-month | Optional billing model for compressed storage. Often cheaper for compressible data. |
| On-demand queries | per TB scanned | Bytes scanned after partition pruning. First 1 TB/month free. $6.25/TB |
| Capacity slots | per slot-hour | Reserved compute capacity. Three editions: Standard, Enterprise, Enterprise Plus. $0.04-$0.10/slot-hour |
| Streaming inserts (legacy) | per 200 MB | Deprecated. Storage Write API is cheaper and more efficient. |
Optimization tips
Common ways to reduce google_bigquery_dataset cost without changing the workload.
Partition and cluster large tables
Major on on-demand queriesPartitioning by date and clustering by frequently-filtered columns can cut bytes scanned by 90% or more. Queries that filter on the partition column only scan matching partitions.
Switch from streaming to Storage Write API
50% on ingestStreaming inserts are $0.012/200 MB. Storage Write API is 50% cheaper. For high-volume ingest, this saves real money.
Move to capacity-based pricing for predictable workloads
30-60% on high-volume workloadsIf your monthly query spend on on-demand exceeds ~$2,000/month, capacity-based with reservations is cheaper. Especially with 1- or 3-year commitments.
Set default table expiration
Storage over timeSetting default_table_expiration_ms on the dataset (e.g., 365 days) automatically deletes tables that don't need to live forever. Prevents storage accumulation.
Use approximate aggregation functions
Query-specificAPPROX_COUNT_DISTINCT and similar functions scan less data than exact counterparts. For analytics queries that tolerate 1-2% error, this can cut scan costs significantly.
FAQ
On-demand or capacity pricing?
On-demand: simple, no commitment, right for unpredictable workloads. Capacity: cheaper at high volume, requires commitment, right when you can predict slot usage. Crossover is typically around $2,000/month in on-demand spend.
How does c3x estimate BigQuery cost?
Storage is computed from declared tables and expected size. Queries are usage-based: add monthly_queries_tb to c3x-usage.yml on the dataset. For capacity model, specify reserved_slots and slot_commitment_term.
What's the difference between active and long-term storage?
Tables not modified for 90 days automatically transition to long-term storage at half the per-GB rate. Any modification (insert, update, delete) resets the 90-day clock. The transition is automatic and bidirectional.
Are streaming inserts still recommended?
No. Legacy streaming inserts are deprecated. The Storage Write API replaces them with better cost (50% cheaper), better throughput, and exactly-once semantics.
Related resources
Estimate this resource in your own Terraform
Free, open source, no API key. C3X parses your Terraform and shows line-item cost for every resource, including google_bigquery_dataset.