AWS Lambda cold starts: cost vs latency trade-off

Quick answer

Lambda cold starts add 100ms-10s of latency to the first request on a new instance. For Node.js, Python, Go: typically 200-800ms. For Java: 2-10s without SnapStart, 200-500ms with. The cost impact of cold starts varies: init time is billed for Java but not for most other runtimes. Cold starts are mostly a latency problem, not a cost problem. Mitigate via faster runtimes, Graviton, smaller packages, and Provisioned Concurrency for hot paths.

Lambda cold starts come up in almost every serverless cost discussion. The confusing part: cold starts are mostly a latency problem, not a cost problem, but the mitigations (Provisioned Concurrency) are cost decisions. This post separates the two so you can make the right trade-off for your workload.

What is a Lambda cold start?

When Lambda receives a request and has no warm instance available, it provisions a new one. The cold start has three phases:

INIT phase: AWS provisions the execution environment, downloads your code, initializes the runtime. Time varies by runtime (Node.js ~150ms, Java ~1-5s).
Runtime startup: The runtime loads your code and executes any module-level initialization.
Handler invocation: Your handler function runs to process the request.

After the cold start completes, the instance is "warm" and handles subsequent requests with just phase 3 — no init overhead. Instances stay warm for a few minutes of idle time before AWS retires them.

Is the cold start billed?

It depends on the runtime.

Node.js, Python, Go, Rust

Init time is NOT billed. AWS charges only for the time your handler is invoked. The cold start latency is a user-experience impact, not a billing impact.

Java without SnapStart, .NET

Init time IS billed in the "init duration" line. Java's slow startup means substantial billed time on every cold start. A function with 2-second init that runs for 200ms on each warm invocation gets billed 2200ms on cold starts and 200ms on warm invocations.

This is why Java Lambda was historically more expensive than equivalent Node.js or Python. SnapStart changes the math.

SnapStart

SnapStart caches the post-init JVM state as a snapshot. On cold start, Lambda restores from the snapshot in ~200ms instead of running full init. Restoration is billed at $0.0000156 per GB-second (lower than Provisioned Concurrency).

SnapStart effectively converts Java's slow cold starts into fast restorations, at modest additional cost. For Java workloads, SnapStart should be on by default in 2026.

Cold start latency by runtime (typical)

Approximate cold start durations for a typical function (no VPC):

Go: 50-200ms
Rust: 50-200ms
Node.js 20+: 150-400ms
Python 3.11+: 200-500ms
Java with SnapStart: 200-600ms
.NET: 500-1500ms
Java without SnapStart: 2-10s
Container images: add 500ms-2s on top of runtime
VPC-attached: add 200-400ms on top of any runtime

These vary by function size, dependencies, and AWS region. The ranking is consistent: compiled languages (Go, Rust) are fastest; JVM/CLR are slowest without snapshot acceleration.

When cold starts matter

User-facing synchronous APIs

An HTTP endpoint where a 500ms cold start gets passed to the user as added latency. For login, checkout, or other critical user paths, cold starts directly degrade experience.

Mitigations: smaller runtime + smaller package, Provisioned Concurrency for predictable traffic, switch to Cloud Run / ECS / EC2 for steady high-traffic services where Lambda's economics don't justify the latency.

Streaming/real-time

Functions invoked by Kinesis or DynamoDB Streams with low sustained traffic can cold-start frequently. A 500ms cold start on a stream that expects sub-100ms latency adds noticeable lag.

Internal async services

Background jobs, queue consumers, scheduled tasks. Cold starts don't affect users directly. Mitigation usually isn't necessary.

Provisioned Concurrency: the explicit trade-off

Provisioned Concurrency reserves Lambda instances in a "warm" state ready to serve requests instantly. Pricing:

$0.0000041666 per GB-second of provisioned capacity
Plus $0.000005556 per GB-second of actual usage (cheaper than on-demand)

For a 512 MB function with 10 provisioned instances running 24/7: 10 × 0.5 GB × 3600s × 24h × 30d × $0.0000041666 = $54/month for provisioned capacity, plus usage charges on top.

Crossover with on-demand: roughly when sustained traffic exceeds 5M-10M requests/month per function. Below that, on-demand is cheaper despite cold starts. Above that, Provisioned Concurrency actually saves money AND eliminates cold starts.

When to use Provisioned Concurrency

Critical user-facing endpoints with sustained traffic
Functions where cold-start latency violates SLA
Predictable workloads where you can right-size provisioned capacity

When NOT to use it

Low-traffic functions (less than ~1M req/month)
Functions invoked irregularly
Dev/staging or internal-only services

Set up auto-scaling on Provisioned Concurrency to match traffic patterns. Static provisioning often over-provisions and wastes money.

The cheap mitigations (no Provisioned Concurrency)

Switch to arm64 (Graviton)

arm64 Lambda has slightly faster cold starts than x86_64 plus 20% cheaper per GB-second. Free latency win:

resource "aws_lambda_function" "api" {
  architectures = ["arm64"]
  # ...
}

Use newer runtime versions

Node.js 20+ cold-starts faster than Node.js 16. Python 3.12 is slightly faster than 3.9. Always pin to the latest supported runtime version for performance and security.

Reduce deployment package size

Smaller packages load faster. Remove unused dependencies. Use Lambda Layers for shared code instead of bundling. For Node.js, consider bundling with esbuild to tree-shake. A package going from 50 MB to 5 MB can shave 100-300ms off cold starts.

Avoid VPC attachment when possible

VPC-attached Lambda adds 200-400ms of network init on cold start (Lambda has to set up ENIs in your VPC). If your function only needs internet access (no VPC-private resources), skip VPC attachment.

If you do need VPC access, use private NAT or VPC endpoints to access AWS services. See our NAT Gateway alternatives post.

Cache outside the handler

Connections, credentials, configuration loaded inside the handler are re-fetched on every invocation. Move them to module scope so they happen once per cold start, not once per request:

# Bad
def handler(event, context):
    db = connect_to_db()
    secret = get_secret()
    return process(event, db, secret)

# Good
db = connect_to_db()
secret = get_secret()

def handler(event, context):
    return process(event, db, secret)

This doesn't affect cold start time directly but it means warm invocations are dramatically faster, which improves your warm-vs-cold ratio and effective latency.

The decision framework

Measure: enable Lambda Insights or use CloudWatch logs to find your cold-start frequency and duration.
If cold-start latency is acceptable: do nothing. On-demand Lambda is fine.
If unacceptable: try cheap mitigations first (arm64, smaller package, runtime upgrade, skip VPC).
For Java specifically: enable SnapStart. Almost always worth it.
If still unacceptable and traffic is steady: enable Provisioned Concurrency for the hot path. Use auto-scaling.
If traffic is unpredictable and latency-sensitive: consider moving the function to Cloud Run, ECS Fargate, or EC2.

Estimating costs in C3X

For Lambda with Provisioned Concurrency, c3x reads aws_lambda_provisioned_concurrency_config and adds the continuous billing line:

resource "aws_lambda_provisioned_concurrency_config" "api" {
  function_name                     = aws_lambda_function.api.function_name
  qualifier                         = aws_lambda_function.api.version
  provisioned_concurrent_executions = 10
}

For SnapStart, add snap_start to the Lambda function resource. c3x picks it up and applies the restoration billing rate.

For full Lambda pricing including all variants, see the aws_lambda_function catalog page.

FAQ

Are Lambda cold starts billed?

The init phase of a cold start is billed for languages that initialize during INIT (Java, .NET with snapStart disabled). For Node.js, Python, Go, and Rust on standard Lambda, cold-start init time is not billed; only request handling is. The bigger cost concern with cold starts is the latency impact, not the bill.

How long do Lambda cold starts take?

Varies by runtime and configuration. Node.js and Python cold starts are typically 200-800ms. Java without SnapStart can be 2-10 seconds. Go and Rust are usually 100-300ms. Container image cold starts are slower than zip cold starts (1-3 seconds typical). VPC-attached Lambdas add 200-400ms of network init.

Does Provisioned Concurrency eliminate cold starts?

Yes, but at a cost. Provisioned Concurrency pre-warms a configured number of instances. They stay ready to handle requests instantly. You pay $0.0000041666 per GB-second of provisioned capacity, continuously, regardless of usage. Right for predictable high-traffic functions where cold starts would impact users.

Is SnapStart for Java actually free?

SnapStart for Java has its own cost: $0.0000156 per GB-second of restoration. Not the same as Provisioned Concurrency. SnapStart caches the JVM init state and restores it quickly on cold start, typically reducing cold-start time by 70-90%. For Java functions with cold start sensitivity, SnapStart is much cheaper than Provisioned Concurrency.

How do I reduce cold start latency without paying more?

Five techniques. Switch to a faster runtime (Go, Rust, Node.js cold-start faster than Java/.NET). Use arm64/Graviton (slightly faster cold starts plus 20% cost savings). Minimize package size (smaller deployments load faster). Avoid VPC attachment when possible. Move connection setup to global scope so it happens once per instance, not per request.

When is Provisioned Concurrency worth the cost?

Crossover with on-demand depends on traffic. Roughly, Provisioned Concurrency pays back if the function would otherwise serve >5M requests/month at sustained traffic. Below that, on-demand with cold starts is cheaper. Use Lambda Power Tuning + traffic projections to compute the actual crossover for your function.

Summary

For most Lambda workloads, cold starts are a manageable latency issue, not a cost issue. The cheap mitigations (arm64, smaller package, latest runtime) solve most problems. Provisioned Concurrency is right for steady high-traffic functions where cold-start latency would violate SLA, and it can actually save money at scale.

For the broader Lambda cost picture, see estimating AWS Lambda costs from Terraform. For the rest of the cost-optimization series, see the blog index.