aws_glue_crawler cost estimation

An aws_glue_crawler scans data sources (S3, JDBC, DynamoDB) to infer schemas and populate the AWS Glue Data Catalog. Cost is per Data Processing Unit-hour (~$0.44/DPU-hour, billed per second with a 10-minute minimum per run) — and only while the crawler is actually running. 10 DPU-hours a month is ~$4.40.

That run-only billing makes crawlers cheap: a crawler that runs for a few minutes on a schedule costs cents per run. The cost only grows if you run crawlers too frequently, over too much data, or on schedules far tighter than the data actually changes — crawling a large, slowly-changing dataset every hour wastes DPU-hours that a daily schedule would avoid.

The levers: match crawl frequency to how often the schema actually changes (most data sources don't need hourly crawling), and scope crawlers to the paths/tables that change rather than re-crawling everything. The Data Catalog storage itself is separate (first million objects free).

c3x prices the crawler from monthly_dpu_hours, since it's billed only during runs; set it to your expected crawl time.

Terraform example

A minimal but realistic configuration that C3X can estimate.

resource "aws_glue_crawler" "data_lake" { name = "data-lake-crawler" role = aws_iam_role.glue.arn database_name = aws_glue_catalog_database.main.name s3_target { path = "s3://my-data-lake/raw/" } schedule = "cron(0 6 * * ? *)" }

Dimension	Unit	What's being charged
Crawler DPU	per DPU-hour	Per Data Processing Unit-hour while the crawler runs (per-second, 10-minute minimum per run). Bills only during crawls. $0.44/DPU-hour → 10 DPU-hours ≈ $4.40/month

Dimension

Unit

What's being charged

Crawler DPU

per DPU-hour

Per Data Processing Unit-hour while the crawler runs (per-second, 10-minute minimum per run). Bills only during crawls.

$0.44/DPU-hour → 10 DPU-hours ≈ $4.40/month

Optimization tips

Common ways to reduce aws_glue_crawler cost without changing the workload.

Match crawl frequency to schema change rate

Proportional to runs avoided

Crawlers bill per DPU-hour while running. Most data sources don't need hourly crawling — schedule crawls to how often the schema or partitions actually change (often daily or on data arrival), not far tighter.

Scope crawlers to what changes

Per DPU-hour of re-scan avoided

Point crawlers at the specific prefixes/tables that change rather than re-crawling an entire data lake. Incremental crawling (new partitions only) avoids re-scanning unchanged data each run.

Trigger on data arrival instead of fixed schedules

Empty-run DPU-hours

Event-driven crawling (run when new data lands) avoids empty scheduled runs against datasets that haven't changed, while keeping the catalog fresh.

Manage partitions without crawlers where possible

The crawler cost where avoidable

For predictable partition layouts, adding partitions via the Glue API or partition projection (in Athena) can avoid crawler runs entirely.

FAQ

How is an AWS Glue crawler billed?

Per Data Processing Unit-hour (~$0.44/DPU-hour, per-second with a 10-minute minimum per run) — and only while the crawler runs. A crawler running a few minutes on a schedule costs cents per run; 10 DPU-hours/month is ~$4.40.

Why would a crawler cost more than expected?

Running too frequently or over too much data. Crawling a large dataset every hour when it changes daily wastes DPU-hours. Match crawl frequency to the actual schema/partition change rate and scope crawlers to what changes.

How does c3x estimate the cost?

From monthly_dpu_hours, since the crawler bills only during runs. Set it to your expected crawl runtime; the Data Catalog storage is separate (first million objects free).

`aws_glue_crawler` cost estimation

Terraform example

Pricing dimensions

Sample C3X output