azurerm_machine_learning_compute_cluster cost estimation
An autoscaling cluster for ML training and batch jobs. Priced on the underlying VM rate per node-hour, scaling between min and max nodes with the workload.
An azurerm_machine_learning_compute_cluster provides managed, autoscaling compute for Azure ML training runs, hyperparameter sweeps, and batch inference. Azure ML adds no markup over the VM, you pay the standard Azure VM rate for the nodes the cluster runs, which is why its cost is best understood as "VM rate times active node-hours."
The VM size sets the per-node rate (a Standard_DS3_v2 is roughly $0.27/hour; GPU sizes like NC-series are far more). The cluster scales between min_node_count and max_node_count: at min 0, an idle cluster costs nothing because it scales to zero between jobs. Actual cost is the VM rate times the node-hours your jobs actually consume, which depends on how much training you run.
c3x reads the vm_size and the scale settings and prices node-hours. Because real cost depends on utilization (jobs are bursty), c3x estimates against the max node count and a utilization figure you supply in c3x-usage.yml; with min 0 and no usage, the idle cost is zero. The scale-to-zero behavior is the key cost feature, set min_node_count to 0 so the cluster doesn't bill between jobs.
Terraform example
A minimal but realistic configuration that C3X can estimate.
resource "azurerm_machine_learning_compute_cluster" "train" {
name = "gpu-train"
machine_learning_workspace_id = azurerm_machine_learning_workspace.ml.id
location = "eastus"
vm_size = "Standard_DS3_v2"
vm_priority = "Dedicated"
scale_settings {
min_node_count = 0
max_node_count = 4
scale_down_nodes_after_idle_duration = "PT30M"
}
}Pricing dimensions
What you actually pay for when you provision azurerm_machine_learning_compute_cluster.
| Dimension | Unit | What's being charged |
|---|---|---|
| Node hours | per node-hour | Standard Azure VM rate for vm_size, billed per active node. Azure ML adds no markup. c3x prices node-hours from scale settings and utilization. $0.27/hour for Standard_DS3_v2 |
| Scale-to-zero | min_node_count = 0 | With a minimum of zero, the cluster scales down between jobs and costs nothing while idle. |
Sample C3X output
Example output from c3x estimate (max 4 nodes, ~200 active hours/month):
azurerm_machine_learning_compute_cluster.train
└─ Node hours (DS3_v2, 4 x 200h) 800 node-hours $216.00
OVERALL TOTAL $216.00
(actual cost scales with real job hours)Optimization tips
Common ways to reduce azurerm_machine_learning_compute_cluster cost without changing the workload.
Set min_node_count to 0
All idle node timeA non-zero minimum keeps nodes running and billing between jobs. With min 0, the cluster scales to zero when idle, so you pay only for actual training time. This is the single biggest lever.
Use low-priority (Spot) nodes for interruptible jobs
60-90%Setting vm_priority to LowPriority uses Spot capacity at 60-90% off for training that tolerates preemption. Checkpoint your runs and let evicted nodes resume.
Tune scale-down idle duration
Post-job idle node timeA long scale_down_nodes_after_idle_duration keeps nodes warm (and billing) after a job finishes. Shorten it so idle nodes release quickly once a run completes.
FAQ
How does c3x estimate ML compute cluster cost?
It reads vm_size and scale settings and prices node-hours at the standard Azure VM rate (no ML markup). Because jobs are bursty, c3x estimates against max nodes and a utilization figure from c3x-usage.yml; with min 0 and no usage, idle cost is zero.
Does Azure ML add a markup over the VM price?
No. You pay the standard Azure VM rate for the cluster's nodes. The managed orchestration (scheduling, scaling, job management) is included at no extra per-node charge.
Why can an ML cluster cost nothing when idle?
With min_node_count set to 0, the cluster scales all nodes down between jobs. An idle cluster has no running nodes and therefore no charge, you pay only during actual training runs.
How do GPU clusters change the cost?
GPU VM sizes (NC, ND, NV series) cost many times more per node-hour than CPU sizes. c3x prices whatever vm_size you set, so a GPU cluster's node-hours are far pricier, making scale-to-zero and Spot even more valuable.
What's the difference from a compute instance?
A compute cluster autoscales for batch/training jobs and can scale to zero. An azurerm_machine_learning_compute_instance is a single always-on dev box (billed continuously). Use clusters for jobs, instances for interactive development.
Related resources
Estimate this resource in your own Terraform
Free, open source, no API key. C3X parses your Terraform and shows line-item cost for every resource, including azurerm_machine_learning_compute_cluster.