27-4 · Chapter 27 · 4 min read

When Two People Change the Same Infrastructure State at the Same Time

Imagine this: Developer A and Developer B both need to update some infrastructure. They pull the current state from the same S3 bucket, each makes their

When Two People Change the Same Infrastructure State at the Same Time

Imagine this: Developer A and Developer B both need to update some infrastructure. They pull the current state from the same S3 bucket, each makes their own changes, and then both upload their new state back to S3. The result? One person's changes silently overwrite the other's. The infrastructure running in the cloud no longer matches what the state file says. Nobody knows which resources are actually managed, and the team starts losing control.

This isn't about two people editing the same resource directly. It's about two people editing the record of those resources. And the fix is a mechanism called state locking.

What State Locking Actually Does

State locking is straightforward in concept. Before anyone starts modifying the state, the system tries to lock it. If the lock succeeds, that person can read the state, make changes, and save the new version. While that's happening, anyone else trying to access the same state gets blocked. They wait until the lock is released. Once the changes are saved, the lock releases automatically.

The following sequence diagram illustrates how two developers interact with the state and lock backends:

sequenceDiagram participant A as Developer A participant B as Developer B participant S3 as State Backend (S3) participant DB as Lock Backend (DynamoDB) A->>DB: Acquire lock DB-->>A: Lock acquired A->>S3: Read current state S3-->>A: State data B->>DB: Acquire lock DB-->>B: Lock denied (wait) A->>S3: Write new state S3-->>A: OK A->>DB: Release lock DB-->>A: Lock released B->>DB: Acquire lock DB-->>B: Lock acquired B->>S3: Read current state S3-->>B: State data

Without this mechanism, concurrent changes corrupt your state. And corrupted state means you no longer have a reliable source of truth for your infrastructure.

How Different Backends Handle Locks

Not all state backends handle locking the same way. If you store state in S3, you need an additional backend like DynamoDB to manage the lock. DynamoDB records who holds the lock, when it started, and which state is locked. Every time someone tries to modify the state, the system checks DynamoDB first. If a lock is active, the operation stops and returns an error.

Here is a minimal Terraform backend configuration that enables state locking with S3 and DynamoDB:

terraform {
  backend "s3" {
    bucket         = "my-company-terraform-state"
    key            = "prod/network/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

The dynamodb_table attribute tells Terraform to use a DynamoDB table named terraform-locks for locking. This table must exist before running terraform init. It needs a primary key named LockID (type String). Terraform will automatically create and release lock entries in this table during state operations.

Some backends have built-in locking. Consul and etcd, for example, are designed for distributed systems, so locking is part of their normal operation. You don't need to set up extra components like DynamoDB. But that convenience comes with a trade-off: you now have to run and maintain Consul or etcd yourself.

When Locks Fail

Locks can fail in ways that leave your state stuck. Here's a common scenario: someone starts a change, then their internet connection drops mid-operation. The lock was acquired but never released because the process died. Now the state is locked and nobody can modify it.

In this situation, you need to force unlock. This is not something you do casually. If you force-release a lock while the original process is still running somewhere, you risk corrupting the state. The team needs to verify that the hanging process is truly dead before doing a force unlock.

Tools like Terraform provide commands for manual unlock. But force unlock should never be a routine operation. If your team frequently deals with stuck locks, look at the root causes: network stability, timeout configurations, or how pipelines are executed.

Why This Matters Beyond Just Locking

State locking is a bridge to more structured environment management. Once your state is safe from concurrent changes, you can think about how to use one configuration across multiple environments without duplicating code. That's where workspaces come in.

But before moving to workspaces, get the locking right. A team that ignores state locking will eventually face a situation where infrastructure changes silently disappear, resources get orphaned, and nobody trusts the state file anymore.

Practical Checklist for State Locking

Choose a backend with locking support. S3 needs DynamoDB. Consul and etcd have it built in. Know what your backend requires.
Test lock failure scenarios. Simulate a network drop during a state change. Does the lock get stuck? Can your team recover without panic?
Document the force unlock procedure. Write down exactly how to verify a hanging process is dead and how to release the lock. Don't leave this to tribal knowledge.
Monitor lock contention. If multiple people frequently hit locked states, your team might need better coordination or smaller, more frequent changes.
Set appropriate timeouts. Long-running operations should have timeouts that release locks automatically if the process hangs.

The Concrete Takeaway

State locking is not optional. Without it, concurrent changes will silently corrupt your infrastructure state, and you won't know until something breaks in production. Set up locking before your team grows beyond one person, and treat force unlock like a fire extinguisher: know where it is, know how to use it, but don't plan on using it every day.