Managing Terraform state securely is one of those things that seems simple until you’re dealing with hundreds of accounts and thousands of resources. Cloudflare, being their own Customer Zero, had to solve this problem at enterprise scale.

The interesting part? They built a custom solution called tfstate-butler - a Go program that acts as an HTTP backend for Terraform state storage.

The Security Problem

When you’re managing infrastructure at Cloudflare’s scale, a single compromised state file could be catastrophic. Traditional state backends often use shared encryption keys or store everything in one place, which means if someone gets access, they get access to everything.

The Solution: Unique Encryption Per State File

Cloudflare’s tfstate-butler solves this by ensuring each state file gets its own unique encryption key. This limits the blast radius - if one state file is compromised, it doesn’t affect the others. It’s a simple concept, but it’s the kind of security thinking that makes sense when you’re managing critical infrastructure.

The tool operates as an HTTP backend for Terraform, so it integrates seamlessly with their existing workflows. Teams don’t have to change how they work - they just get better security by default.

Why This Matters

At scale, you can’t manually manage state file security. You need automation that enforces good practices. By baking unique encryption keys into the state storage layer, Cloudflare ensures that even if someone makes a mistake or there’s a security incident, the damage is contained.

It’s a good reminder that infrastructure security isn’t just about the resources you’re managing - it’s also about how you manage the management tools themselves.

If you want to read more about how Cloudflare manages infrastructure at scale, check out their full post on shifting left at enterprise scale. They cover their entire IaC stack, policy as code approach, and some hard-earned lessons from managing hundreds of production accounts.


Reference: Shifting left at enterprise scale: how we manage Cloudflare with Infrastructure as Code