28-2 · Chapter 28 · 6 min read

Five Infrastructure Policies That Keep Your Cloud From Burning Money and Security

A developer needs SSH access to a production server for a quick debugging session. They open port 22 to 0.0.0.0/0 so they can connect from their home IP

Five Infrastructure Policies That Keep Your Cloud From Burning Money and Security

A developer needs SSH access to a production server for a quick debugging session. They open port 22 to 0.0.0.0/0 so they can connect from their home IP. The debugging finishes, the ticket closes, and that security group rule stays open for three months. Nobody notices until the cloud bill arrives with a surprise: someone spun up an m5.24xlarge instance in the dev account, running 24/7, with no tags, named test123.

This is not a hypothetical. This pattern repeats across teams every quarter. The fix is not more manual review or stricter access control. The fix is policy written as code, checked automatically before any resource gets created.

Infrastructure policies fall into five categories. Each solves a specific class of problems. Understanding them helps you decide which ones to prioritize in your pipeline.

Security: The Non-Negotiable Baseline

Security policies are the most critical because violations have immediate, visible consequences. The classic example is blocking security group rules that open SSH or database ports to 0.0.0.0/0. Developers often open these for temporary access and forget to close them. A policy that rejects such rules at pipeline time prevents production resources from being accidentally exposed to the internet.

Other common security policies include:

Requiring encryption at rest for S3 buckets and EBS volumes
Blocking outdated TLS versions on load balancers
Enforcing HTTPS-only traffic on all public endpoints
Requiring VPC flow logs for production environments
Mandating IAM roles instead of long-lived access keys

Security policies typically have a hard fail behavior: if the check fails, the pipeline stops and the resource is not created. There is no warning mode for a port open to the entire internet.

Cost: Preventing Accidental Bank Breaks

Cloud resources are expensive when left unchecked. A single developer can accidentally provision an instance type that costs as much as a team member's monthly salary. Cost policies put guardrails around spending without requiring manual approval for every resource.

Typical cost policies include:

Blocking expensive instance types (like m5.24xlarge or r5.metal) in non-production environments
Limiting the number of EBS volumes or GPUs per account
Requiring spot instances for fault-tolerant workloads
Setting maximum storage sizes for databases
Enforcing auto-stop schedules for development environments

Cost policies help teams stay budget-aware, especially when many developers have cloud access. Without them, one person's convenience can become the team's surprise bill.

Tagging: The Metadata That Keeps Operations Running

Tagging sounds boring until you need to figure out who owns a resource that has been running for six months. Tags like owner, environment, cost-center, and project are essential for tracking costs, automating cleanup, and debugging incidents.

Tagging policies enforce that every resource has the required tags at creation time. For example:

Every resource must have an owner tag with a valid email address
Every resource must have an environment tag: dev, staging, or production
Every resource must have a cost-center tag matching the team's budget code

When a resource fails tagging policy, the pipeline can either reject it or create it with a warning and a scheduled cleanup. The important thing is that untagged resources do not silently accumulate. Tagging policies prevent the "orphan resource" problem where billing teams find mystery resources running for months with no clear owner.

Naming: Consistency for Humans and Automation

Resource names matter more than most teams realize. A bucket named test123 and another named data-barang are hard to search, hard to automate against, and hard to troubleshoot. Naming policies enforce consistent patterns so that operations teams and automation tools can find resources quickly.

Common naming policies include:

All S3 buckets must start with the project name
All security groups must have a prefix indicating environment
All RDS instances must follow the pattern {project}-{env}-{function}
All IAM roles must include the service name and permission level

Naming policies are often combined with tagging policies. Together, they ensure that every resource is identifiable, searchable, and manageable at scale. Without them, you end up with a cloud account that looks like a junk drawer.

Compliance: Translating External Rules Into Code

Compliance policies handle requirements from external regulations like PCI DSS, HIPAA, SOC 2, or GDPR. These are not optional. They translate legal and regulatory requirements into automated checks that run before any resource is deployed.

Examples of compliance policies:

All production databases must use encryption at rest
All access to production resources must be logged in a central audit trail
All data must be stored in approved geographic regions
All backups must be encrypted and stored in a separate account
All API access must use multi-factor authentication

Compliance policies are often the hardest to negotiate because they come from outside the engineering team. But encoding them as code makes them consistent, auditable, and much easier to enforce than manual checklists.

How These Policies Interact

These five categories do not operate in isolation. A single EC2 instance gets checked against multiple policies at once: security group rules, instance type, tags, naming pattern, and compliance requirements. A good pipeline runs all these checks before the resource is created, not after.

The following diagram shows how the five policy categories relate to each other and to the deployment pipeline:

flowchart TD A[Pipeline Trigger] --> B{Security & Compliance} B -->|Pass| C{Cost} B -->|Fail| D[Reject Resource] C -->|Pass| E{Tagging} C -->|Fail| D E -->|Pass| F{Naming} E -->|Fail| G[Warn or Reject] F -->|Pass| H[Create Resource] F -->|Fail| G D --> I[Alert Team] G --> I

The order matters too. Security and compliance checks should run first because violations in those categories are non-negotiable. Cost and tagging checks can follow. Naming checks are usually the least critical but still worth enforcing for operational sanity.

Practical Checklist for Getting Started

If you are new to infrastructure policies, start small. Pick one category and automate one check. Here is a sequence that works for most teams:

Week one: Add a security policy that blocks public SSH access. Fail the pipeline hard.
Week two: Add a tagging policy that requires owner and environment tags. Start with a warning, then move to hard fail after two weeks.
Week three: Add a cost policy that blocks expensive instance types in dev accounts. Warn on violation, escalate to the team lead.
Week four: Add naming conventions for the most common resource types in your account.
Month two: Review compliance requirements and encode the top three as automated checks.

The goal is not to write every policy at once. The goal is to build momentum by solving the most painful problems first.

What Matters Most

Security and compliance policies protect you from external threats and legal exposure. Cost policies protect your budget. Tagging and naming policies protect your operational sanity. All five categories work together to turn infrastructure management from a manual, error-prone process into an automated, consistent one.

Start with the policy that hurts the most today. For most teams, that is either the security group wide open to the internet or the mystery resource running up the bill. Automate that check, then move to the next. Over time, your pipeline becomes a safety net that catches mistakes before they become incidents.