28-3 · Chapter 28 · 5 min read

Why Your Infrastructure Rules Should Be Written as Code

Your team has a policy: no security group should ever open SSH port 22 to the entire internet. Everyone agrees. It's in the documentation. Someone even

Why Your Infrastructure Rules Should Be Written as Code

Your team has a policy: no security group should ever open SSH (port 22) to the entire internet. Everyone agrees. It's in the documentation. Someone even printed it out and stuck it on the wall.

Then one Friday afternoon, a developer creates a new security group for a quick test. They set the CIDR block to 0.0.0.0/0 on port 22 because they just want to test something fast. The change goes through. Nobody notices until Monday morning when the security team sends an alert about an exposed SSH port.

This scenario repeats across teams every week. Not because people are careless, but because manual policy enforcement doesn't scale. When infrastructure changes happen multiple times a day, relying on human memory or document-based rules is a recipe for gaps.

The Problem with Document-Based Policies

Most teams start with policies stored in documents. A Google Doc, a Confluence page, or a PDF that lives in a shared drive. These documents describe what should and shouldn't happen: "All S3 buckets must be encrypted," "Only approved AMIs can be used," "Every resource must have a cost center tag."

The problem is that documents don't enforce anything. They describe intent, but they don't execute. When someone makes a change that violates a policy, the document doesn't stop them. It just sits there, silently waiting for someone to remember to check it.

Documents also drift. Someone updates the policy, but the old version stays in someone's bookmark. Or the team grows, and new members don't know the document exists. Over time, the gap between what's documented and what's actually deployed widens.

What It Means to Write Policy as Code

Policy as code means taking those rules and writing them in a format that machines can read and execute. Instead of a document that says "don't open SSH to the world," you write a rule that checks every infrastructure change against that constraint automatically.

The policy lives in a file, just like your application code. It goes through version control. It gets reviewed in pull requests. It can be tested, updated, and rolled back. When someone proposes a change that violates the policy, the pipeline catches it before the resource is created.

This shift from document to code changes how teams interact with rules. Policies become part of the engineering workflow, not an afterthought that someone checks during an audit.

A Concrete Example with Open Policy Agent

Let's make this tangible. Open Policy Agent (OPA) is a popular tool for writing policy as code. It uses a language called Rego. Here's what a simple policy looks like that blocks SSH access from anywhere:

deny if {
    input.resource.type == "aws_security_group_rule"
    "0.0.0.0/0" in input.resource.cidr_blocks
    input.resource.port == 22
}

This rule says: if someone tries to create a security group rule that opens port 22 to all IP addresses, mark it as denied. The policy file sits in your repository. When a pull request comes in that adds a security group rule, the CI pipeline runs OPA against the proposed changes. If the rule matches, the pipeline fails, and the developer gets immediate feedback.

You can also write the inverse: a rule that only allows specific CIDR blocks for SSH access:

allow if {
    input.resource.type == "aws_security_group_rule"
    input.resource.cidr_blocks[_] != "0.0.0.0/0"
}

The exact syntax depends on your tool and policy language, but the pattern is the same: rules are code, and code runs automatically.

Another Approach: Sentinel for Terraform Users

If your team uses Terraform extensively, HashiCorp's Sentinel offers tighter integration. Sentinel policies are written specifically for Terraform's execution context. Here's the same SSH restriction in Sentinel:

import "tfplan"

allowed_cidrs = ["10.0.0.0/8", "172.16.0.0/12", "192.168.0.0/16"]

main = rule {
    all tfplan.resource_changes as _, change {
        change.type is "aws_security_group_rule" implies
        change.change.after.cidr_blocks all allowed_cidrs
    }
}

This policy checks that every security group rule uses only private IP ranges. If someone tries to use a public CIDR block, the policy blocks the change.

The difference between OPA and Sentinel is mostly about ecosystem. OPA is general-purpose and works with many tools beyond Terraform. Sentinel is deeply integrated with HashiCorp products, which makes setup simpler if you're already in that ecosystem. But the core idea is identical: policies are code that runs in your pipeline.

Where to Store Your Policy Files

You have two main options for where to keep policy files:

In the same repository as your infrastructure code. This keeps policies close to the resources they govern. When someone modifies infrastructure, they see the relevant policies in the same pull request. This works well for team-specific policies.

In a dedicated policy repository. This centralizes all policies across the organization. A platform team maintains the repository, and multiple infrastructure repos pull policies from it. This works better for organization-wide compliance rules that shouldn't vary between teams.

Both approaches are valid. Start with the same repository approach if you're new to policy as code. Move to a dedicated repository when you need to enforce policies across multiple teams consistently.

Practical Checklist for Getting Started

Before you dive into writing policies, run through this quick checklist:

Identify your top three most violated policies. Don't try to codify everything at once. Pick the rules that cause the most pain or risk.
Choose one tool. Start with OPA if you want flexibility across tools. Start with Sentinel if you're deeply invested in Terraform.
Write one policy and test it manually. Run it against a known violation to confirm it catches the problem.
Add the policy check to your CI pipeline. Make it block the build, not just warn. Warnings get ignored.
Review and iterate. After a week, check if the policy caught anything unexpected. Adjust false positives.

The Real Value Is in the Workflow

The tool you choose matters less than the workflow you adopt. When policies are code, they get the same treatment as application code. They are reviewed, tested, versioned, and improved over time. When a policy causes a false positive, someone opens a pull request to fix it. When a new compliance requirement comes in, someone writes a new rule and ships it through the same pipeline.

This workflow eliminates the gap between policy intent and actual enforcement. The rule that says "no SSH to the world" is no longer a document that someone might forget to check. It's a line of code that runs every single time infrastructure changes. That's the difference between hoping your team follows the rules and knowing that they do.