28-5 · Chapter 28 · 4 min read

When Infrastructure Policy Gets in the Way: Handling Exceptions Without Breaking Security

You've spent weeks crafting infrastructure policies. Every resource must follow naming conventions, use approved instance types, and never expose certain

When Infrastructure Policy Gets in the Way: Handling Exceptions Without Breaking Security

You've spent weeks crafting infrastructure policies. Every resource must follow naming conventions, use approved instance types, and never expose certain ports to the public. The pipeline enforces these rules automatically. Everything is clean, controlled, and secure.

Then a team comes to you with a request that violates three policies at once. They need a large memory instance that's normally banned due to cost. They need to open a debug port temporarily because a production issue only reproduces in the real environment. And the legacy system they're migrating forces them to use a resource name that breaks your naming standard.

What do you do?

The Problem With Rigid Policies

If your answer is "policy is policy, no exceptions," you're setting yourself up for failure. Teams will find workarounds. They'll quietly modify policies, create resources outside the pipeline, or simply ignore the rules. The policy becomes useless because people prefer working around the system rather than fighting with rigid rules.

The goal isn't to eliminate policies. It's to make them flexible enough that people don't feel forced to bypass them. You need a clear mechanism for exceptions that keeps everyone accountable.

Three Things Every Exception Needs

A well-designed exception system has three non-negotiable components: logging, approval, and expiration.

Logging

Every exception must be recorded. Who requested it, which policy was bypassed, what resources were affected, why the exception was needed, and who approved it. This information should never disappear. It serves multiple purposes: audit trails, post-mortem analysis, and a gentle reminder that the team is operating outside normal rules.

Without logging, exceptions become invisible technical debt. You won't know how many exceptions exist, who granted them, or whether they're still needed.

Approval

The person requesting an exception cannot approve it themselves. Someone else must sign off, ideally someone who understands the risks the policy was designed to mitigate. If the exception touches security, the security team approves. If it affects cost, finance or an engineering manager approves.

This approval can be integrated directly into your pipeline as a gate. The pipeline pauses until the authorized person reviews and approves the exception request. No approval, no deployment.

Expiration

Exceptions should never be permanent. Every exception needs a time limit, typically 7 to 30 days. After that, the policy re-applies automatically. The violating resource must be fixed or removed. If the exception is still necessary, the team must submit a new request with updated reasoning.

This mechanism prevents exceptions from becoming the new normal. It forces teams to either fix the underlying issue or justify why the policy needs to change.

Designing the Exception Flow

The exception process should be slightly inconvenient. Not to punish people, but to ensure they genuinely need the exception. If it's too easy, everyone will use exceptions instead of following policies. If it's too hard, people will find loopholes. The sweet spot is somewhere in between: annoying enough to make people think twice, but not so painful that it blocks legitimate work.

In practice, the flow looks like this:

The following diagram illustrates the exception request process:

flowchart TD A[Pipeline detects policy violation] --> B{Choose action} B -->|Cancel| C[Change blocked] B -->|Request exception| D[Submit exception request] D --> E[Log request details] E --> F[Notify approver] F --> G{Approved?} G -->|No| H[Change blocked] G -->|Yes| I[Pipeline continues with exception status] I --> J[Set expiration date] J --> K[Send reminders before expiry] K --> L{Expired?} L -->|No| M[Resource in exception] L -->|Yes| N[Re-run policy check] N --> O{Compliant?} O -->|Yes| P[Exception closed] O -->|No| Q[Enforce policy / fix resource]

The pipeline runs its policy checks during plan or apply.
A violation is detected. The pipeline offers two options: cancel the change, or submit an exception request.
If the team submits an exception, the pipeline creates a ticket or notification for the authorized approver.
Once approved, the pipeline continues, but marks the resource as being in exception status.
The system sends reminders before the exception expires. After expiration, it automatically re-runs the policy check.

What Not to Do

Never create exceptions without expiration dates or approval. That's equivalent to having no policy at all. An exception that never expires is just a policy that was silently rewritten.

Also, don't use exceptions as an excuse to avoid improving your policies. If the same type of exception keeps appearing, your policy might be too strict. Maybe you need a new resource category, or the original rule no longer makes sense. Frequent exceptions are a signal that your policies need evaluation, not just workarounds.

Practical Checklist for Exception Handling

Every exception is logged with requester, policy bypassed, resources affected, reason, and approver
Exceptions require approval from someone who understands the policy's risk
All exceptions have explicit expiration dates (7-30 days recommended)
The pipeline blocks deployment until exception is approved
Automated reminders are sent before exception expiration
Expired exceptions trigger automatic policy re-check
Exception frequency is reviewed quarterly to identify policy improvements

The Takeaway

Policies without exception mechanisms create shadow systems. Teams will work around them, and you lose visibility into what's actually running in your infrastructure. A well-designed exception flow doesn't weaken your policies. It strengthens them by making them realistic enough that people actually follow them. The key is logging, approval, and expiration. Without all three, you don't have an exception process. You have a policy that's already broken.