25-6 · Chapter 25 · 6 min read

Policy as Code: Keeping Infrastructure Changes Under Control

You have three environments: development, staging, and production. Each one is managed through infrastructure as code. The pipeline runs, plans look good

Policy as Code: Keeping Infrastructure Changes Under Control

You have three environments: development, staging, and production. Each one is managed through infrastructure as code. The pipeline runs, plans look good, and changes get applied. But one day, someone creates a resource in the wrong region. Another time, a cloud resource goes up without the mandatory cost-center tag. Nobody noticed until the bill arrived.

Separate environments solve the problem of testing changes before they reach production. But they don't solve the problem of people making changes that violate organizational rules. You need something that enforces those rules automatically, not just a document that nobody reads.

Why Manual Rules Don't Work

Most teams have policies written somewhere. A wiki page says all resources must have an environment tag. A Slack message says production resources only go in ap-southeast-1. A meeting decision says any change to firewall rules needs security team approval.

These rules exist, but they rely on human memory and good intentions. People forget. New team members don't know the wiki exists. Urgent changes skip the review process. And when something goes wrong, you have no way to prove whether the rule was followed or not.

Manual governance creates friction without providing safety. The rules are there, but they're not enforced until after the damage is done.

Policy as Code: Rules That Run in the Pipeline

Policy is a rule that every infrastructure change must follow. Governance is how you make sure those rules are actually followed. When you write both as code and run them in your pipeline, you get something called policy as code.

The idea is simple: instead of having rules in a document, you write them as executable checks that run before any change is applied. The pipeline evaluates the proposed change against your policies. If a policy is violated, the pipeline stops and reports exactly what went wrong.

Here is a simple Sentinel policy that enforces a required environment tag on every resource:

# Require every resource to have an "environment" tag
mandatory_tag = rule {
  all tfplan.resources as _, resource {
    resource.applied.tags contains "environment"
  }
}

main = rule {
  mandatory_tag
}

This is the same principle as infrastructure as code. You treat your rules the same way you treat your infrastructure definitions. They live in a repository, they get reviewed, they get tested, and they get applied consistently.

Common Policies You Can Enforce

Tagging requirements are the most common starting point. Many organizations require every cloud resource to have tags like environment, owner, or cost-center. Without enforcement, you get inconsistent tags, missing tags, or tags with typos. With policy as code, the pipeline checks the plan output and rejects any resource that doesn't meet the tagging rules.

Region restrictions are another frequent policy. If your organization has decided that production workloads only run in specific regions, the pipeline should enforce that. A developer might accidentally target a different region during a late-night deployment. The policy catches it before any resource is created.

Approval workflows can also be embedded in policy. Not all changes carry the same risk. Changing a firewall rule in staging might need one reviewer. Changing the same rule in production might need approval from the security team. The pipeline can check the environment, the resource type, and the scope of the change to determine which approval path is required.

How It Works in Practice

The typical flow looks like this. After the infrastructure plan is generated and before the apply step runs, the pipeline executes policy checks. These checks read the plan output and evaluate it against your rules.

Here is a visual representation of that flow:

flowchart TD A[Code Commit] --> B[Generate Plan] B --> C{Policy Check} C -- Pass --> D[Apply Change] C -- Fail --> E[Block Pipeline] E --> F[Show Error: Rule, Resource, Expected Value] D --> G[Deployment Complete]

You can use dedicated tools like Open Policy Agent (OPA) or Sentinel for complex policies. Or you can write simple scripts that parse the plan and check specific conditions. The tool matters less than the principle: the check must be automatic, consistent, and blocking.

If a policy violation is found, the pipeline stops. The output shows exactly which rule was violated, which resource triggered it, and what the expected value should be. The developer gets immediate feedback, not a ticket that arrives three days later.

Exceptions Are Part of the System

Policies should not be absolute barriers. There are legitimate reasons to make exceptions. A new business requirement might need resources in a region that wasn't previously approved. An emergency fix might need to bypass normal tagging rules.

The key is to make exceptions visible and documented. Instead of someone silently bypassing the policy, they submit an exception request. The request goes through an approval process, and the approved exception is recorded in the audit trail. The pipeline can even check for approved exceptions and allow the change to proceed.

This turns exceptions from shadow processes into documented decisions. You know who approved what, when, and why. That information is valuable for audits, post-mortems, and future policy reviews.

Why This Makes Teams Faster

It sounds counterintuitive. Adding automated checks to your pipeline sounds like it would slow things down. But the opposite is true.

Without policy as code, every change that might touch a sensitive area requires manual review. The reviewer has to check the plan, remember the rules, and make a judgment call. That takes time, and the reviewer might miss something.

With policy as code, routine checks happen automatically. The pipeline rejects clearly invalid changes in seconds. The reviewer only needs to look at changes that genuinely need human judgment. The team moves faster because they're not waiting for manual checks on things that could be automated.

And when something goes wrong, you have a clear audit trail. You know what changed, who approved it, and which policies were involved. No more guessing, no more blaming, no more "I didn't know that rule existed."

A Quick Checklist for Getting Started

If you're considering policy as code for your infrastructure pipeline, here are the steps to begin:

Pick one policy that causes the most pain today. Tagging is usually a good start.
Write the policy as a script or use a tool like OPA. Test it against a known violation.
Add the policy check to your pipeline, between the plan and apply steps.
Run it in non-blocking mode first. Log violations but let the pipeline continue.
Review the violations for a week. Fix false positives and adjust the rules.
Switch to blocking mode. Make the pipeline stop on violations.
Document the exception process. Make it clear how to request an override.

The Takeaway

Policy as code turns your infrastructure rules from forgotten documents into automated guards. It catches violations before they reach production, provides clear audit trails, and lets your team move faster by automating routine checks. The rules live in your repository, get reviewed like code, and run in every pipeline execution. That's governance that actually works, not governance that lives on a wiki page nobody reads.