Who Owns Production? Why Privilege Boundaries Matter Between Environments
A developer on your team makes a small change to a configuration file. They meant to update the staging environment, but they accidentally ran the command against production. Five minutes later, users start reporting errors. The team scrambles to figure out what happened, who made the change, and how to revert it. Meanwhile, the developer who caused the problem isn't sure if they even had permission to touch production in the first place.
This scenario plays out more often than most teams want to admit. It's not about bad intentions or incompetent engineers. It's about a missing boundary between environments. When everyone has the same access to development, staging, and production, the system has no way to distinguish between intentional changes and mistakes.
The Problem With Universal Access
Many teams start with a simple setup: everyone gets access to everything. Developers can modify staging, production, or even another team's development environment. It feels practical at first. No one has to wait for permissions. No one needs to ask for help to deploy a fix.
But this approach creates two problems. First, accountability becomes fuzzy. When something breaks in production and everyone had access, finding the root cause takes longer. You have to check who ran what command, from which machine, at what time. Second, the blast radius of a mistake grows. A developer working on a feature branch can accidentally trigger a production deployment. A typo in a configuration file can take down a service that hundreds of users depend on.
The solution is not to lock everything down so tightly that no one can work. The solution is to create clear privilege boundaries between environments.
What Is a Privilege Boundary?
A privilege boundary is a clear separation of who can do what in each environment. It's not just about permissions in a tool. It's about how you structure your state backend, your repository, and your pipeline configuration.
For example, the state file for production should live in a different location than the state file for development. If you use Terraform, the production state backend could be a separate S3 bucket with stricter IAM policies. Only a few people in the team have access to that bucket. The development state backend might be in a shared bucket that everyone on the team can read and write.
The principle here is least privilege. Every person or system gets only the access they need to do their job. A developer might need full access to the development environment. They might need read-only access to the production state to understand what's running. But if they need to make a change in production, that change should go through a more formal process: a pull request reviewed by another team member, or a pipeline that requires approval.
Ownership Makes Boundaries Concrete
Privilege boundaries work best when each environment has a clear owner. The owner is the person or team responsible when something goes wrong in that environment. Ownership is not about control. It's about accountability.
In practice, ownership often looks like this:
- The platform engineering team owns production. They are responsible for its stability, performance, and security.
- The application development team owns staging. They use it to validate their changes before requesting a production deployment.
- Individual developers own their personal development environments. They can break them, rebuild them, and experiment freely.
This division does not mean developers cannot touch production. It means that when they do, they follow a process that involves the production owner. The owner reviews the change, understands the risk, and approves or rejects the deployment.
How to Implement Privilege Boundaries in Practice
Privilege boundaries show up in three places: your repository structure, your pipeline configuration, and your state backend.
Repository Structure
The simplest way to enforce boundaries is through directory structure and branch protection. Production configurations can live in a separate directory or even a separate repository. If they are in the same repository, the production directory should be protected. Only specific team members can push changes to it. Pull requests that modify production configurations require approval from the production owner.
The following flowchart shows how a change request flows through privilege boundaries based on who is making the change and which environment it targets.
Pipeline Configuration
Your CI/CD pipeline is where boundaries become operational. The pipeline for production should only run when certain conditions are met. For example, it might only trigger from a protected branch. It might require manual approval from a designated person. It might run additional validation steps that the development pipeline skips.
Some teams take this further. They configure the production pipeline to only run from a specific CI/CD runner that has access to the production state backend. Developers cannot run the production pipeline from their laptops because their local machines do not have the required credentials.
State Backend
The state backend is where infrastructure-as-code tools store the current state of your environments. If you use Terraform, the state file for production should be in a separate backend with strict access controls. The IAM policy for that backend should only allow operations from the production pipeline, not from individual developer accounts.
For example, you can configure the production state backend with a policy that says: "Only the CI/CD service account can write to this state file. Everyone else can only read it." This way, even if a developer accidentally runs a Terraform command against production, the command fails because they cannot acquire a state lock.
Here is a Terraform IAM policy that enforces this boundary by allowing write access only to dev state files while denying write access to prod state files:
data "aws_iam_policy_document" "state_access" {
statement {
sid = "AllowDevWrite"
effect = "Allow"
actions = [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject"
]
resources = [
"arn:aws:s3:::my-tf-state-bucket/env:/dev/*"
]
}
statement {
sid = "DenyProdWrite"
effect = "Deny"
actions = [
"s3:PutObject",
"s3:DeleteObject"
]
resources = [
"arn:aws:s3:::my-tf-state-bucket/env:/prod/*"
]
}
}
Boundaries Are Not About Distrust
Some teams resist privilege boundaries because they feel like a sign of distrust. "We hired smart people. Why can't we trust them with production access?"
This framing misses the point. Privilege boundaries are not about trust. They are about clarity and accountability. When everyone has access to everything, no one knows who to call when something breaks. When boundaries are clear, the team knows exactly who owns each environment and what process to follow.
Boundaries also protect the people who make mistakes. A developer who accidentally breaks production because they had unrestricted access feels terrible. They also waste the team's time while everyone investigates the incident. With clear boundaries, that same developer would have been stopped before the change reached production. The system catches the mistake, not the person.
A Quick Checklist for Setting Up Privilege Boundaries
If you are reviewing your current setup, here are a few things to check:
- Is the production state backend separate from development and staging?
- Can developers write to the production state from their laptops?
- Does the production pipeline require manual approval?
- Is there a clear owner for each environment?
- Are changes to production configurations reviewed by the owner before deployment?
These checks are not exhaustive, but they cover the most common gaps that lead to production incidents.
What Comes Next
Once you have privilege boundaries and ownership in place, the next challenge is keeping your infrastructure state accurate. Sometimes changes happen outside of your infrastructure-as-code tooling. Someone modifies a server configuration directly from the cloud console. A team member applies a hotfix manually during an incident. These changes create drift between your state and reality. Drift undermines the reliability of your infrastructure and makes future changes unpredictable. That is a topic worth exploring separately, but for now, the important step is to establish clear boundaries first. Without them, drift detection and remediation become much harder to manage.