AzureAzure HDInsightAnalytics

azurerm_hdinsight_spark_cluster cost estimation

A managed Apache Spark cluster billed per node-hour for head and worker VMs. A 2-head + 3-worker cluster is ~$1,894/month, running continuously.

An azurerm_hdinsight_spark_cluster provisions a managed Apache Spark cluster on Azure HDInsight. Cost is per node-hour across all nodes — the two head nodes plus the worker nodes — at the underlying VM rate plus an HDInsight surcharge. A cluster with 2 head nodes and 3 D12v2-class workers runs ~$0.519/node-hour × 5 nodes × 730 ≈ $1,894/month, billed continuously whether or not jobs run.

That "continuously" is the cost trap. HDInsight clusters don't auto-pause — every node bills 24/7 from creation until the cluster is deleted. A cluster left up between batch jobs burns thousands per month doing nothing. The cost-efficient pattern for batch Spark on HDInsight is to create the cluster for a job (or a job window) and delete it afterward, storing data externally in ADLS/Blob so it persists across cluster lifetimes.

The other levers are worker count (scale to the job's parallelism) and node VM size. For interactive or always-on Spark, modern alternatives like Synapse Spark pools (which auto-pause) or Databricks are often more cost-effective.

c3x prices the cluster from the worker node count and node size, so the always-on cost is visible before deployment.

Terraform example

A minimal but realistic configuration that C3X can estimate.

resource "azurerm_hdinsight_spark_cluster" "analytics" {
  name                = "spark-cluster"
  resource_group_name = azurerm_resource_group.main.name
  location            = azurerm_resource_group.main.location
  cluster_version     = "5.1"
  tier                = "Standard"

  roles {
    head_node {
      vm_size = "Standard_D12_v2"
      # ... credentials
    }
    worker_node {
      vm_size               = "Standard_D12_v2"
      target_instance_count = 3
    }
    zookeeper_node {
      vm_size = "Standard_A2_v2"
    }
  }
}

Pricing dimensions

What you actually pay for when you provision azurerm_hdinsight_spark_cluster.

DimensionUnitWhat's being charged
Cluster nodesper node-hourAll nodes (2 head + workers) bill per node-hour at the VM rate plus HDInsight surcharge, continuously. The cluster does not auto-pause.
$0.519/node-hour (D12v2-class) → 5 nodes ≈ $1,894.35/month

Sample C3X output

2 head + 3 worker nodes (D12v2-class), running 24/7:

azurerm_hdinsight_spark_cluster.analytics
└─ Cluster nodes (5 × D12v2-class)   3650 node-hours   $1,894.35
                                     Monthly           $1,894.35

Optimization tips

Common ways to reduce azurerm_hdinsight_spark_cluster cost without changing the workload.

Create per-job, delete after — don't leave it running

Most of the cost for batch workloads

HDInsight clusters bill all nodes 24/7 and don't auto-pause. For batch Spark, create the cluster for the job window and delete it afterward, keeping data in ADLS/Blob so it survives. A cluster idle between jobs is thousands wasted.

Right-size worker count and VM size

Proportional to right-sizing

Cost scales linearly with worker count and node VM size. Match workers to the job's parallelism and the VM size to its memory/CPU profile rather than over-provisioning a standing cluster.

Consider Synapse Spark pools or Databricks

Large vs an always-on HDInsight cluster

Synapse Spark pools auto-pause when idle and Databricks job clusters spin up per job — both avoid HDInsight's always-on node billing for interactive or intermittent Spark.

Use autoscale where the cluster must stay up

Worker-hours during low load

If a cluster genuinely needs to persist, HDInsight autoscale adjusts worker count to load so you're not paying for peak workers around the clock.

FAQ

Does an HDInsight cluster pause when idle?

No — HDInsight clusters bill all nodes (head + workers) per hour continuously from creation until deletion, with no auto-pause. A cluster left up between jobs is the classic HDInsight overspend, burning thousands per month doing nothing.

How do I run Spark on HDInsight cost-effectively?

Create the cluster for a job or job window and delete it afterward, storing data in ADLS/Blob so it persists across cluster lifetimes. For interactive or always-on Spark, Synapse Spark pools (auto-pause) or Databricks are usually cheaper.

How does c3x estimate the cost?

From the worker node count and node size (plus the two head nodes), pricing node-hours at the HDInsight rate. The estimate is the always-on cost; per-job create/delete usage brings the real bill down.

Related resources

Estimate this resource in your own Terraform

Free, open source, no API key. C3X parses your Terraform and shows line-item cost for every resource, including azurerm_hdinsight_spark_cluster.