azurerm_hdinsight_spark_cluster cost estimation
A managed Apache Spark cluster billed per node-hour for head and worker VMs. A 2-head + 3-worker cluster is ~$1,894/month, running continuously.
An azurerm_hdinsight_spark_cluster provisions a managed Apache Spark cluster on Azure HDInsight. Cost is per node-hour across all nodes — the two head nodes plus the worker nodes — at the underlying VM rate plus an HDInsight surcharge. A cluster with 2 head nodes and 3 D12v2-class workers runs ~$0.519/node-hour × 5 nodes × 730 ≈ $1,894/month, billed continuously whether or not jobs run.
That "continuously" is the cost trap. HDInsight clusters don't auto-pause — every node bills 24/7 from creation until the cluster is deleted. A cluster left up between batch jobs burns thousands per month doing nothing. The cost-efficient pattern for batch Spark on HDInsight is to create the cluster for a job (or a job window) and delete it afterward, storing data externally in ADLS/Blob so it persists across cluster lifetimes.
The other levers are worker count (scale to the job's parallelism) and node VM size. For interactive or always-on Spark, modern alternatives like Synapse Spark pools (which auto-pause) or Databricks are often more cost-effective.
c3x prices the cluster from the worker node count and node size, so the always-on cost is visible before deployment.
Terraform example
A minimal but realistic configuration that C3X can estimate.
resource "azurerm_hdinsight_spark_cluster" "analytics" {
name = "spark-cluster"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
cluster_version = "5.1"
tier = "Standard"
roles {
head_node {
vm_size = "Standard_D12_v2"
# ... credentials
}
worker_node {
vm_size = "Standard_D12_v2"
target_instance_count = 3
}
zookeeper_node {
vm_size = "Standard_A2_v2"
}
}
}Pricing dimensions
What you actually pay for when you provision azurerm_hdinsight_spark_cluster.
| Dimension | Unit | What's being charged |
|---|---|---|
| Cluster nodes | per node-hour | All nodes (2 head + workers) bill per node-hour at the VM rate plus HDInsight surcharge, continuously. The cluster does not auto-pause. $0.519/node-hour (D12v2-class) → 5 nodes ≈ $1,894.35/month |
Sample C3X output
2 head + 3 worker nodes (D12v2-class), running 24/7:
azurerm_hdinsight_spark_cluster.analytics
└─ Cluster nodes (5 × D12v2-class) 3650 node-hours $1,894.35
Monthly $1,894.35Optimization tips
Common ways to reduce azurerm_hdinsight_spark_cluster cost without changing the workload.
Create per-job, delete after — don't leave it running
Most of the cost for batch workloadsHDInsight clusters bill all nodes 24/7 and don't auto-pause. For batch Spark, create the cluster for the job window and delete it afterward, keeping data in ADLS/Blob so it survives. A cluster idle between jobs is thousands wasted.
Right-size worker count and VM size
Proportional to right-sizingCost scales linearly with worker count and node VM size. Match workers to the job's parallelism and the VM size to its memory/CPU profile rather than over-provisioning a standing cluster.
Consider Synapse Spark pools or Databricks
Large vs an always-on HDInsight clusterSynapse Spark pools auto-pause when idle and Databricks job clusters spin up per job — both avoid HDInsight's always-on node billing for interactive or intermittent Spark.
Use autoscale where the cluster must stay up
Worker-hours during low loadIf a cluster genuinely needs to persist, HDInsight autoscale adjusts worker count to load so you're not paying for peak workers around the clock.
FAQ
Does an HDInsight cluster pause when idle?
No — HDInsight clusters bill all nodes (head + workers) per hour continuously from creation until deletion, with no auto-pause. A cluster left up between jobs is the classic HDInsight overspend, burning thousands per month doing nothing.
How do I run Spark on HDInsight cost-effectively?
Create the cluster for a job or job window and delete it afterward, storing data in ADLS/Blob so it persists across cluster lifetimes. For interactive or always-on Spark, Synapse Spark pools (auto-pause) or Databricks are usually cheaper.
How does c3x estimate the cost?
From the worker node count and node size (plus the two head nodes), pricing node-hours at the HDInsight rate. The estimate is the always-on cost; per-job create/delete usage brings the real bill down.
Related resources
Estimate this resource in your own Terraform
Free, open source, no API key. C3X parses your Terraform and shows line-item cost for every resource, including azurerm_hdinsight_spark_cluster.