AzureAzure DatabricksAnalytics

azurerm_databricks_workspace cost estimation

Managed Spark and Delta Lake. Bills DBU (Databricks Units) at $0.15-$0.55/DBU-hour depending on workload type plus underlying VM costs. Photon engine adds ~30% to DBU rate for ~2-3x performance.

Azure Databricks is the managed Spark/Delta Lake platform on Azure. The azurerm_databricks_workspace creates the workspace; clusters and jobs are configured via the Databricks API or terraform databricks provider. Pricing has two components: DBUs (Databricks consumption) and the underlying Azure VMs.

DBU pricing by workload type (Standard tier, varies by region): - Jobs Compute: $0.15/DBU-hour (cheapest, for scheduled jobs) - All-Purpose Compute: $0.40/DBU-hour (interactive clusters for notebooks) - SQL Compute: $0.22/DBU-hour (Databricks SQL) - Jobs Light Compute: $0.07/DBU-hour (lightweight automated jobs)

Premium tier (Azure AD integration, audit logs, secrets management) adds ~50% to DBU rates.

DBU consumption rates per VM type: - Standard_DS3_v2: 0.75 DBU/hour - Standard_DS4_v2: 1.5 DBU/hour - Standard_DS5_v2: 3 DBU/hour

A typical interactive cluster with 4 Standard_DS4_v2 worker nodes + 1 driver for 8 hours: - VM cost: 5 × ~$0.50/hour × 8 = $20 - DBU cost: 5 × 1.5 DBU × $0.40 × 8 = $24 - Total per cluster session: $44

Photon Engine adds ~30% to DBU rates but provides 2-3x query performance for SQL workloads. For SQL-heavy workloads, Photon is usually cost-effective.

Spot instances (called "Spot pricing" in Databricks) for worker nodes can save 60-80% on VM cost. Doesn't affect DBU rate.

c3x estimates Databricks based on declared cluster configurations (node_type_id, num_workers). For DBU rate, specify workload type in c3x-usage.yml.

Terraform example

A minimal but realistic configuration that C3X can estimate.

resource "azurerm_databricks_workspace" "main" {
  name                = "prod-databricks"
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location
  sku                 = "premium"

  custom_parameters {
    no_public_ip = true
  }

  tags = {
    Environment = "production"
  }
}

Pricing dimensions

What you actually pay for when you provision azurerm_databricks_workspace.

DimensionUnitWhat's being charged
Jobs Compute DBUper DBU-hourCheapest tier; for scheduled batch jobs. Standard SKU.
$0.15/DBU-hour
All-Purpose Compute DBUper DBU-hourInteractive clusters for notebooks and ad-hoc analysis.
$0.40/DBU-hour
SQL Compute DBUper DBU-hourDatabricks SQL warehouses (Pro tier).
$0.22/DBU-hour Standard, ~$0.55/DBU-hour Serverless
Underlying Azure VMsper hourEC2 (Databricks calls these workers) running the actual Spark. Bills at standard Azure VM rates.
$0.10-$5/hour per VM
Photon engine premiumpercentageAdds to DBU rate but provides 2-3x performance for SQL workloads.
+30% on DBU rate

Optimization tips

Common ways to reduce azurerm_databricks_workspace cost without changing the workload.

Use Jobs Compute for scheduled workloads

62% on scheduled jobs

All-Purpose Compute is 2.5x more expensive than Jobs Compute per DBU. Scheduled ETL, batch processing, model training should use Jobs Compute. Reserve All-Purpose for interactive notebook work.

Use Spot for worker nodes

60-80% on worker VMs

Spot pricing for Databricks workers saves 60-80%. Spark handles task retry on Spot interruption gracefully. Set fallback_to_ondemand to handle Spot exhaustion. Driver should stay on-demand (interruption kills the job).

Enable autoscaling and auto-termination

30-70% on variable workloads

Autoscaling reduces worker count during low load. auto_termination_minutes = 60 shuts down idle interactive clusters. Critical for dev/staging — abandoned notebooks can rack up hundreds per day.

Use Photon for SQL workloads

30-60% on SQL-heavy workloads

Photon's 2-3x query speedup more than offsets the 30% DBU premium. For SQL-heavy clusters, Photon is usually cost-positive. For pure Python/Scala Spark, Photon doesn't help — skip it.

Right-size DBU vs VM

20-40% by family choice

Use Compute Optimized VMs (Standard_F* series) when CPU is the bottleneck. They have lower DBU rate per vCPU than General Purpose. For memory-bound workloads, use Memory Optimized (Standard_E*). The wrong VM family makes DBU cost worse.

Use Standard SKU when premium features aren't needed

33% on DBU vs Premium

Premium SKU adds Azure AD pass-through, audit logs, customer-managed keys for ~50% DBU premium. For dev/test and many production workloads, Standard SKU's features suffice.

FAQ

Is Databricks cheaper than running Spark on raw VMs?

Sometimes. Databricks adds DBU charges (often 50-100% premium over raw VM cost) in exchange for managed Spark, optimized runtime (Photon), Delta Lake, and operational features. For ad-hoc analytics, the DBU premium is usually worth it. For massive sustained Spark workloads, self-managed on AKS may be cheaper.

How do DBUs work?

DBUs are a measure of compute consumption — roughly 'normalized compute units'. Each VM type consumes a certain DBU rate per hour (e.g., DS4_v2 = 1.5 DBU/hour). Multiply by workload-type DBU rate ($0.15-$0.55) for the Databricks fee. Plus the underlying VM cost. So a cluster has two cost components.

Should I use Serverless SQL warehouses?

For variable query workloads, yes. Serverless SQL warehouses scale to zero between queries and start in seconds. Higher DBU rate ($0.55 vs $0.22) but you pay only for active query time. For workloads with idle periods, much cheaper than provisioned SQL warehouses.

Does Spot work well for Spark?

Yes, for worker nodes. Spark handles task failures by re-executing on remaining nodes. Spot interruptions cause some tasks to retry, slightly longer job duration, but rarely failed jobs. The driver must be on-demand — its interruption kills the entire job.

Related resources

Estimate this resource in your own Terraform

Free, open source, no API key. C3X parses your Terraform and shows line-item cost for every resource, including azurerm_databricks_workspace.