AWS just shipped something that reframes what “serverless” can mean. Lambda MicroVMs are not Lambda Functions with a bigger timeout. They are a fundamentally different primitive: stateful, VM-level isolated environments with an explicit lifecycle you control.

The key shift: instead of getting a recycled process for 15 minutes, you get a dedicated Firecracker microVM that lives up to 8 hours — and you decide when it starts, suspends, resumes, and terminates.


What Lambda MicroVMs actually are

Regular Lambda Functions are stateless by design. The runtime is reused between invocations, memory is gone when the function exits, and 15 minutes is a hard ceiling. That model is perfect for APIs and event-driven workloads. It is not good for anything that needs to maintain state across interactions.

Lambda MicroVMs flip this. Each MicroVM is:

  • A dedicated Firecracker microVM with its own kernel, memory space and disk state
  • Addressable via a unique HTTPS endpoint (per VM, not shared)
  • Capable of running any Dockerfile-based application, not just a handler function
  • Suspendable and resumable — the VM’s memory and disk are snapshotted and restored

The lifecycle looks like this:

1
LAUNCH (from snapshot) → RUN → SUSPEND → RESUME → TERMINATE

Billing follows: while running you pay for compute; while suspended you pay only for snapshot storage; terminated means zero cost.


How to build with it

The setup is more explicit than standard Lambda, which makes sense given the stateful nature.

Step 1 — build your image

Package your application as a Dockerfile. Zip it with any dependencies and upload to S3. This is your base image.

Step 2 — create a MicroVM configuration

Lambda MicroVM reads the Dockerfile from S3, runs it, and takes a snapshot of the memory and disk state once the application is initialized. This snapshot becomes the golden image from which future instances start.

Step 3 — launch instances from the snapshot

Each run-microvm call starts an independent MicroVM from that snapshot. Fast, because no cold init — it resumes from the pre-baked state.

Step 4 — route traffic to the VM’s endpoint

Every MicroVM gets its own HTTPS URL. You route your user or session to it. Authentication is mandatory: the endpoint requires JWE tokens.


Resources and constraints

Baseline Peak
RAM 0.5–8 GB up to 32 GB
vCPU 0.25–4 up to 16
Disk up to 32 GB
Duration 8 hours

Architecture is ARM64 only at launch. Regions at GA: us-east-1, us-east-2, us-west-2, eu-west-1, ap-northeast-1.

Networking supports HTTP/1.1, HTTP/2, gRPC, WebSockets, and SSE. Internet egress is available by default; VPC egress is possible through a network connector.


What you pay for

  • Baseline compute while the VM is running
  • Peak usage above baseline (billed separately)
  • Snapshot read/write operations
  • Snapshot storage
  • Data transfer

The suspend model is the key cost lever. A suspended VM costs only the snapshot storage. This makes the pattern viable for long-lived interactive sessions that have significant idle time between interactions — exactly the pattern of a user working in an IDE or chatting with an AI agent.


The real use case: one VM per user or session

The design intent is clear from the list of stated use cases:

  • AI / code execution sandboxes
  • Claude / agent sessions
  • Browser IDE
  • Jupyter / data analytics environments
  • CI/CD workers
  • Vulnerability scanning

The common thread: one VM per user, per session, or per agent. Not one VM shared across users. The isolation guarantee — separate kernel, memory, and disk — is the point. You cannot get that from a container or a regular Lambda environment.

For AI agent workloads specifically, this solves a real problem. An agent that executes code on behalf of a user needs to run in a context where: the filesystem state persists across turns, the environment cannot bleed into another user’s session, and the agent can be paused between tasks without losing its working state. MicroVMs check all three boxes.


Lambda MicroVMs vs Lambda Functions

Lambda Functions
Lambda MicroVMs
Execution model
Stateless · event-driven · request-response
Stateful · long-running process per session
Max duration
15 min per invocation
8 hours total (extendable via suspend)
Isolation
Process-level · env reused across invocations
Dedicated Firecracker VM · own kernel, memory & disk
State & lifecycle
Memory lost between invocations
Memory + disk preserved · suspend / resume via API
LAUNCHfrom snapshot
RUNown HTTPS URL
SUSPENDstorage only
RESUMEnear-instant
TERMINATEno charge
Resources
Up to 6 vCPU · 10 GB RAM · 10 GB disk
Baseline 0.25–4 vCPU · 0.5–8 GB RAM
Peak up to 16 vCPU · 32 GB RAM · 32 GB disk
Architecture
x86_64 or ARM64
ARM64 only (at launch)
Networking
Shared endpoint · HTTP + streaming
Unique HTTPS URL per VM · HTTP/1.1, HTTP/2, gRPC, WebSockets, SSE · JWE auth required
Scaling
Automatic · thousands of concurrent envs
One VM per user / job — explicit launch
Pricing model
Requests + GB-seconds · free tier included
Baseline compute while running
+ peak usage + snapshot ops + storage
Suspend → storage billing only
Available regions
All AWS regions
us-east-1 · us-east-2 · us-west-2 · eu-west-1 · ap-northeast-1
Best for
APIs Event-driven Microservices Short jobs Webhooks
AI code sandboxes Agent sessions Browser IDE Jupyter CI/CD workers Multi-tenant SaaS

What this is not

Lambda MicroVMs are not a drop-in replacement for Lambda Functions. They require more operational involvement: you manage the lifecycle explicitly, you provision baseline capacity, and you design around the suspend/resume model.

They also do not scale automatically. You launch one VM per user or job. That is not a limitation — it is the model. Thousands of concurrent MicroVMs are supported, but you orchestrate the allocation, not AWS.

If your workload is stateless, event-driven, or bursty, Lambda Functions are still the right choice. MicroVMs are for the workloads that Lambda Functions were never designed to handle.


The launch is new and the pricing details are still being worked out in practice, but the primitive itself is solid. Firecracker-backed isolation with full lifecycle control, a unique endpoint per VM, and suspend/resume at the API level — that is a meaningful new building block for anyone building multi-tenant interactive applications.