aws_glue_job cost estimation
A managed ETL job using Spark or Python shell. Priced per DPU-hour with 1-minute billing minimum, plus development endpoint and Data Catalog costs.
An aws_glue_job is a serverless ETL job that runs Spark, Python shell, or Ray workloads. The job itself has no fixed cost; you pay only when it runs.
Pricing depends on the worker type:
Spark jobs (G.1X, G.2X, G.4X, G.8X, G.025X for Streaming): billed per DPU-hour. A DPU (Data Processing Unit) represents 4 vCPU + 16 GB memory. G.1X = 1 DPU; G.2X = 2 DPUs; G.4X = 4 DPUs; G.8X = 8 DPUs. Rate is $0.44/DPU-hour with a 1-minute minimum.
Python shell jobs: cheaper, billed per DPU-hour at the same rate but each job uses 0.0625 or 1 DPU. A 1-minute Python shell job at 0.0625 DPU costs $0.0005, basically negligible.
Ray jobs (G.025X): billed per worker-hour. Right for distributed Python workloads.
A typical Spark job with 5 G.1X workers running for 30 minutes costs: 5 DPUs × 0.5 hours × $0.44/DPU-hour = $1.10 per run. Hourly schedules add up: $792/month if running 24/7, much less for scheduled batch.
Other Glue components have their own costs:
Glue Data Catalog: first 1M objects/month free, then $1/100K objects/month. Object requests are usually free.
Glue Crawlers: same DPU-hour rate as jobs ($0.44/DPU-hour), with a 1-minute minimum per crawler run.
Glue Studio: free as an editor; the jobs it generates bill normally.
c3x estimates Glue jobs only with usage data: monthly_run_count and average_run_duration_minutes in c3x-usage.yml.
Terraform example
A minimal but realistic configuration that C3X can estimate.
resource "aws_glue_job" "etl" {
name = "daily-etl"
role_arn = aws_iam_role.glue.arn
number_of_workers = 5
worker_type = "G.1X"
glue_version = "4.0"
command {
script_location = "s3://my-bucket/glue/daily-etl.py"
python_version = "3"
}
default_arguments = {
"--job-bookmark-option" = "job-bookmark-enable"
"--enable-metrics" = "true"
}
timeout = 120 # minutes
max_retries = 1
}Pricing dimensions
What you actually pay for when you provision aws_glue_job.
| Dimension | Unit | What's being charged |
|---|---|---|
| Spark job DPU-hours (G.1X, G.2X, G.4X, G.8X) | per DPU-hour | Each DPU is 4 vCPU + 16 GB memory. Billed per second with a 1-minute minimum per run. $0.44/DPU-hour |
| Python shell jobs | per DPU-hour | Same rate as Spark but uses smaller DPU sizes (0.0625 or 1 DPU). Cheaper for lightweight ETL. $0.44/DPU-hour |
| Ray jobs (G.025X) | per worker-hour | Smaller worker size optimized for Ray-based distributed Python. |
| Glue Crawlers | per DPU-hour | Crawler runs use the same DPU pricing as jobs. |
| Data Catalog | per 100K objects per month | First 1M objects free at the account level. Object requests are also free. $1.00/100K objects beyond free tier |
Optimization tips
Common ways to reduce aws_glue_job cost without changing the workload.
Right-size DPU count and worker type
Linear with worker countMany Glue jobs run with 10 G.1X workers when 3 would suffice. Use the Glue job metrics tab to find the actual DPU utilization and shrink.
Use job bookmarks for incremental processing
Workload-dependentJob bookmarks track processed data so subsequent runs only process new data. Cuts run duration by 90%+ on large incremental datasets.
Use Python shell jobs for small ETL
Significant on small jobsSpark jobs require minimum 2 workers. For lightweight transformations on small data, Python shell at 0.0625 DPU is far cheaper than Spark.
Schedule jobs at off-peak times if possible
Indirect on downstream resourcesGlue itself doesn't have spot pricing or time-of-day discounts, but downstream resources (S3, EMR) might. Coordinating job schedules with off-peak times can reduce total bill.
Use Glue 4.0+ for performance improvements
10-30% on job runtimeNewer Glue versions run jobs faster than older versions on the same DPU count. Migrating jobs from Glue 3.0 to 4.0 typically cuts duration by 10-30%.
FAQ
Why are my Glue jobs so expensive?
Three common causes: over-provisioning DPU count (using 20 workers when 5 would do), running too frequently (hourly when daily would suffice), and processing the full dataset each run (no bookmarks). Address each separately.
How does c3x estimate Glue job cost?
From the job's worker_type and number_of_workers (DPU count) plus expected monthly_runs and average_run_duration_minutes from c3x-usage.yml. Without usage data, the job shows $0.
What's the difference between Glue and EMR?
Glue is fully managed, serverless, lower operational overhead, but higher per-DPU-hour cost than equivalent EMR clusters. EMR gives more control over Spark configuration and is cheaper per DPU but requires cluster management. Pick Glue for ETL where simplicity matters; EMR for analytics where performance tuning matters.
Is Glue Data Catalog free?
Mostly. First 1M objects per account are free. Object access (GetTable, GetPartitions, etc.) is also free. Beyond 1M objects, $1/100K/month. Most accounts stay within the free tier.
Related resources
Estimate this resource in your own Terraform
Free, open source, no API key. C3X parses your Terraform and shows line-item cost for every resource, including aws_glue_job.