azurespot-vmcomputecost-optimization

Azure Spot VMs: when 90% off is worth the eviction risk

Azure Spot VMs run up to ~90% below pay-as-you-go, evictable with 30 seconds' notice. Here's the eviction model, what workloads fit, AKS Spot pools, and how to design for interruption.

The C3X Team··6 min read

Quick answer

Azure Spot VMs run up to ~90% below pay-as-you-go for the same size, in exchange for being evictable with 30 seconds' notice. They're right for fault-tolerant, stateless, or checkpointable work — batch, CI runners, dev/test, AKS Spot pools. Choose an eviction policy (Deallocate to resume later), design for interruption, and keep a small on-demand baseline for anything that must stay up.

Spot is Azure's deepest VM discount, and like every cloud's spot offering it's close to free compute for workloads that don't mind being interrupted. The whole decision is matching Spot to work that survives eviction — and setting the eviction behaviour so an interruption costs you minutes, not data.

The discount and the eviction

Spot pricing runs up to ~90% below pay-as-you-go for an identical VM size, varying with region and available capacity. Azure reclaims the VM with a 30-second warning when it needs the capacity or when the floating spot price exceeds the max price you set. You control two settings:

  • Eviction type: capacity-based (evicted when Azure needs it) or max-price-based (evicted above your price cap).
  • Eviction policy: Deallocate (stop, keep the disk, restart when capacity returns) or Delete (tear down entirely). Deallocate suits resumable work.

What belongs on Spot

  • Batch and big-data processing with checkpointing.
  • CI/CD runners — an evicted build retries on a fresh VM.
  • Dev/test environments where interruption is harmless.
  • Rendering, ML training with checkpoints, ETL.
  • AKS Spot node pools for fault-tolerant pods.

Not for: production databases, stateful singletons, and latency-critical services without redundancy.

Design for eviction

  1. Subscribe to the 30-second scheduled-events eviction notice and drain gracefully.
  2. Checkpoint long jobs so an eviction costs minutes.
  3. Make workers idempotent and retryable.
  4. Spread across zones; use Scale Sets or AKS to recreate evicted instances.
  5. Keep a small on-demand baseline; burst onto Spot.

This is the Azure counterpart to GCP Spot VMs and AWS spot instances — same trade, same design discipline.

FAQ

How much do Azure Spot VMs save?

Spot VMs run up to ~90% below pay-as-you-go for the same VM size, with the discount varying by region and capacity. In exchange, Azure can evict them with 30 seconds' notice when it needs the capacity back or when your max price is exceeded.

How does Azure Spot eviction work?

You choose an eviction policy — Deallocate (stop the VM, keep the disk, restart later) or Delete — and an eviction type: capacity-based (evicted when Azure needs the capacity) or max-price-based (evicted when the spot price exceeds the price you set). Capacity eviction with Deallocate is the common choice for resumable work.

What workloads suit Azure Spot VMs?

Fault-tolerant, stateless, or checkpointable work: batch jobs, CI/CD runners, dev/test environments, rendering, big-data processing, and AKS Spot node pools. Anything that can be interrupted and resumed or retried is a fit. Avoid Spot for stateful singletons and latency-critical production without redundancy.

Can I use Spot with AKS and Scale Sets?

Yes. AKS supports Spot node pools and Virtual Machine Scale Sets support Spot instances, both at the same deep discount. Run interruptible pods or workers there and keep system/stateful components on regular instances with an on-demand baseline.

How do I handle evictions gracefully?

Watch for the 30-second scheduled-events eviction notice, checkpoint long jobs, make workers idempotent and retryable, spread across zones, and use Scale Sets or AKS to recreate evicted instances. Pair a small on-demand baseline with a Spot burst pool for reliability plus savings.

How does C3X estimate Spot savings?

Spot prices float, so C3X prices an azurerm_linux_virtual_machine at the standard pay-as-you-go rate as a conservative ceiling and flags Spot usage — your real Spot cost will be much lower, but the estimate never understates based on a discount that can change.

What to do next

Because Spot prices float, plan against the on-demand ceiling and treat the discount as upside. C3X prices an azurerm_linux_virtual_machine at pay-as-you-go and flags Spot, so estimates stay conservative on a discount that can change. The quickstart runs it in minutes.

Try C3X on your own Terraform

Free and open source. No API key required. One command to install, one command to estimate.