29-2 · Chapter 29 · 5 min read

When Infrastructure Drift Makes Your Terraform Plan Useless

You run a pipeline to deploy a new application version. Terraform plan runs, and the output shows it wants to resize your production database instance

When Infrastructure Drift Makes Your Terraform Plan Useless

You run a pipeline to deploy a new application version. Terraform plan runs, and the output shows it wants to resize your production database instance. Nobody on the team intended to touch the database. The pull request reviewer stares at the plan, confused. Is this a side effect of the new version? A security requirement they missed? Someone approves it just to unblock the release, and suddenly your database is running on smaller hardware during peak traffic.

This scenario is not hypothetical. It happens when infrastructure has drifted away from what your code defines. And once drift sets in, your most trusted tool for safe infrastructure changes becomes unreliable.

What Drift Actually Means

Infrastructure drift is the gap between what your code says infrastructure should look like and what actually exists in your cloud environment. You wrote Terraform, Pulumi, or CloudFormation code that defines your resources. Someone logs into the cloud console and changes an instance type, adds a security group rule, or tweaks a database parameter. That change never touches your code repository. It never goes through a pipeline. It just happens.

The infrastructure still works. Applications still run. But your code no longer describes reality. It describes what reality used to be.

The Plan That Lies to You

Infrastructure as Code tools like Terraform work by comparing two things: your code definitions and your state file (which tracks what exists in the cloud). When both match, the plan you get is accurate. It shows exactly what will change based on your latest code commit.

The following flowchart shows how drift creates a mismatch between code, state, and actual infrastructure, leading to an inaccurate plan:

flowchart TD A[Code Definitions] --> B[Terraform Plan] C[State File] --> B D[Actual Infrastructure] -.->|drift| C D -.->|out of sync| B B --> E[Plan Shows Unintended Changes] style D fill:#f9f,stroke:#333,stroke-width:2px style E fill:#f96,stroke:#333,stroke-width:2px

Drift breaks this comparison. The state file becomes outdated because the actual infrastructure changed outside the pipeline. When Terraform runs plan, it reads the stale state and compares it with your code. The result looks like Terraform wants to make changes, but those changes are actually corrections to bring infrastructure back to what your code says. Not changes you intended to make.

This is plan drift: a plan that reflects pre-existing differences between code and reality, not the changes you actually want to deploy.

Three Ways Drift Damages Your Pipeline

Unexpected Destruction

This is the most dangerous outcome. Imagine a network security group that your security team created manually for isolating a sensitive workload. That resource does not exist in your IaC code. When your pipeline runs terraform apply using code that never defined that resource, Terraform sees it as something that should not exist. It destroys it. Your security configuration vanishes silently, and nobody notices until something breaks.

Wasted Review Time

Pull request reviewers see a plan that shows changes to resources unrelated to the feature being deployed. They cannot tell if these are accidental side effects, required updates, or something suspicious. Time gets spent investigating changes that are not actually part of the work. Review cycles stretch longer. Teams start asking questions like "Did someone touch production again?" instead of focusing on the actual code change.

Broken Trust in Automation

When plans keep showing unexpected changes, teams stop trusting the pipeline. They run manual checks before deploying. They start making changes directly in the console because the pipeline feels unreliable. The irony is brutal: the more changes happen outside the pipeline, the worse the drift becomes. The pipeline becomes less trustworthy, so people bypass it more, which makes it even less trustworthy.

Why Drift Happens in Real Teams

Drift is not caused by bad engineers. It happens because real work creates situations where the pipeline is not the fastest path:

An incident requires an immediate change. The on-call engineer fixes it in the console because writing code, committing, and waiting for a pipeline takes too long.
A database admin tunes parameters directly in the cloud console because they do not work with IaC tools daily.
A security team adds temporary firewall rules during an audit and forgets to document them.
A developer needs to test something quickly and manually creates a resource, planning to "add it to code later."

Each of these actions makes sense in isolation. Together, they create a gap between code and reality that grows until the pipeline cannot be trusted.

Detecting Drift Before It Hurts

The fix is not to forbid console access. That approach fails because emergencies and legitimate operational needs will always bypass rigid rules. The fix is to detect drift automatically and surface it before anyone runs a plan.

Most IaC tools offer drift detection features. Terraform Cloud and Enterprise have drift detection that runs plans on a schedule and alerts when the real infrastructure differs from the state. OpenTofu, the open-source fork of Terraform, includes similar capabilities. You can also build your own detection using scheduled pipeline runs that compare state against actual cloud resources.

The key is making drift visible. When a team member changes something in the console, the next scheduled drift check should flag it. The team can then decide: update the code to match the change, or revert the change back to what the code defines. Either choice is valid, as long as it is intentional and tracked.

A Practical Drift Detection Checklist

If you manage infrastructure with IaC, consider adding these checks to your routine:

Schedule a weekly drift detection run for production environments
Alert the team when drift is found, not just when it causes failures
Review drift reports in your regular team sync
Document a clear process for reconciling drift: either update code or revert the change
Treat manual console changes as temporary workarounds, not permanent solutions

The Real Cost of Ignoring Drift

Drift does not break your infrastructure immediately. It erodes the reliability of your pipeline slowly. Each undetected drift makes the next plan less trustworthy. Each plan that surprises the team makes them less willing to automate. Eventually, your infrastructure becomes a black box where nobody knows what is actually running, and the code repository becomes an aspirational document rather than a source of truth.

The goal is not to eliminate drift completely. Some drift is inevitable in complex systems. The goal is to detect it quickly, make it visible, and give your team a clear path to reconcile it. When your plan shows only the changes you intended, you can trust your pipeline again.