aws_glue_crawler cost estimation
A crawler that catalogs data sources into the Glue Data Catalog, billed per DPU-hour while it runs. 10 DPU-hours/month is ~$4.40 — it only bills during crawl runs.
An aws_glue_crawler scans data sources (S3, JDBC, DynamoDB) to infer schemas and populate the AWS Glue Data Catalog. Cost is per Data Processing Unit-hour (~$0.44/DPU-hour, billed per second with a 10-minute minimum per run) — and only while the crawler is actually running. 10 DPU-hours a month is ~$4.40.
That run-only billing makes crawlers cheap: a crawler that runs for a few minutes on a schedule costs cents per run. The cost only grows if you run crawlers too frequently, over too much data, or on schedules far tighter than the data actually changes — crawling a large, slowly-changing dataset every hour wastes DPU-hours that a daily schedule would avoid.
The levers: match crawl frequency to how often the schema actually changes (most data sources don't need hourly crawling), and scope crawlers to the paths/tables that change rather than re-crawling everything. The Data Catalog storage itself is separate (first million objects free).
c3x prices the crawler from monthly_dpu_hours, since it's billed only during runs; set it to your expected crawl time.
Terraform example
A minimal but realistic configuration that C3X can estimate.
resource "aws_glue_crawler" "data_lake" {
name = "data-lake-crawler"
role = aws_iam_role.glue.arn
database_name = aws_glue_catalog_database.main.name
s3_target {
path = "s3://my-data-lake/raw/"
}
schedule = "cron(0 6 * * ? *)"
}Pricing dimensions
What you actually pay for when you provision aws_glue_crawler.
| Dimension | Unit | What's being charged |
|---|---|---|
| Crawler DPU | per DPU-hour | Per Data Processing Unit-hour while the crawler runs (per-second, 10-minute minimum per run). Bills only during crawls. $0.44/DPU-hour → 10 DPU-hours ≈ $4.40/month |
Sample C3X output
10 DPU-hours of crawler runtime in a month:
aws_glue_crawler.data_lake
└─ Crawler DPU 10 DPU-hours $4.40
Monthly $4.40Optimization tips
Common ways to reduce aws_glue_crawler cost without changing the workload.
Match crawl frequency to schema change rate
Proportional to runs avoidedCrawlers bill per DPU-hour while running. Most data sources don't need hourly crawling — schedule crawls to how often the schema or partitions actually change (often daily or on data arrival), not far tighter.
Scope crawlers to what changes
Per DPU-hour of re-scan avoidedPoint crawlers at the specific prefixes/tables that change rather than re-crawling an entire data lake. Incremental crawling (new partitions only) avoids re-scanning unchanged data each run.
Trigger on data arrival instead of fixed schedules
Empty-run DPU-hoursEvent-driven crawling (run when new data lands) avoids empty scheduled runs against datasets that haven't changed, while keeping the catalog fresh.
Manage partitions without crawlers where possible
The crawler cost where avoidableFor predictable partition layouts, adding partitions via the Glue API or partition projection (in Athena) can avoid crawler runs entirely.
FAQ
How is an AWS Glue crawler billed?
Per Data Processing Unit-hour (~$0.44/DPU-hour, per-second with a 10-minute minimum per run) — and only while the crawler runs. A crawler running a few minutes on a schedule costs cents per run; 10 DPU-hours/month is ~$4.40.
Why would a crawler cost more than expected?
Running too frequently or over too much data. Crawling a large dataset every hour when it changes daily wastes DPU-hours. Match crawl frequency to the actual schema/partition change rate and scope crawlers to what changes.
How does c3x estimate the cost?
From monthly_dpu_hours, since the crawler bills only during runs. Set it to your expected crawl runtime; the Data Catalog storage is separate (first million objects free).
Related resources
Estimate this resource in your own Terraform
Free, open source, no API key. C3X parses your Terraform and shows line-item cost for every resource, including aws_glue_crawler.