23-2 · Chapter 23 · 5 min read

Writing Database Migrations That Won't Break Production

You have a database that's been running for months. Users depend on it. Tables have grown large, queries have been optimized, and the schema has settled

Writing Database Migrations That Won't Break Production

You have a database that's been running for months. Users depend on it. Tables have grown large, queries have been optimized, and the schema has settled into a shape that works. Then comes the feature request that requires a new column, a table rename, or a data migration.

The moment you run that ALTER TABLE on production, you're making a bet. If the migration takes too long, queries queue up. If it locks the table, users see errors. If it fails halfway, you need a way back. And if you don't have a rollback plan, the only option is restoring from backup, which means losing any data entered since the last snapshot.

This is why safe database migrations are not just about writing correct SQL. They are about structuring changes so they can be reviewed, tested, and reversed without panic.

Every Migration Needs Two Files

The simplest pattern that saves teams repeatedly is the up-and-down migration pair.

Here is a concrete example of a paired up and down migration:

-- 20241101_add_last_login_at.up.sql
ALTER TABLE users ADD COLUMN last_login_at TIMESTAMP;

-- 20241101_add_last_login_at.down.sql
ALTER TABLE users DROP COLUMN last_login_at;

An up migration contains the SQL that makes the change. A down migration contains the SQL that undoes it. Every change gets both files, stored together with a unique identifier so the order is clear.

20241101_add_email_index.sql          -- up
20241101_add_email_index_down.sql     -- down

The identifier can be a timestamp, a sequence number, or a date prefix. What matters is that anyone looking at the folder can see the exact order of changes. When a migration runs in production and something goes wrong, the down migration gives you a fast, predictable way to revert.

Without a down migration, your only fallback is a full database restore. That takes time, requires coordination, and risks losing recent data. A down migration runs in seconds.

When Down Migrations Are Not Enough

Down migrations work well for reversible changes: adding a column, creating an index, inserting reference data. But some changes are hard to undo.

Dropping a column is a common example. Once the column is gone and the data is deleted, a down migration that re-adds the column cannot bring the data back. The same problem applies to renaming tables, changing column types, or removing constraints that other parts of the system depend on.

For these cases, the safe approach is to break the change into multiple small migrations, each reversible on its own:

Add a new column with the desired type.
Backfill the data in batches.
Update the application code to use the new column.
Drop the old column.

Each step has its own up and down migration. If step 3 reveals a problem, you can revert step 2 and step 1 cleanly. You never reach a point where the only way out is a restore.

Write Migrations That Can Run Multiple Times

Pipelines fail. Network drops, timeouts happen, and sometimes a migration runs halfway before the process crashes. When the pipeline retries, the migration runs again.

If your migration assumes the change hasn't been applied yet, it will fail on the second run. That failure blocks the entire pipeline and requires manual intervention.

Make every migration idempotent. Use IF NOT EXISTS when creating tables or indexes. Use IF EXISTS when dropping objects. Check whether a column already exists before altering it. The goal is simple: running the same migration twice should produce the same result as running it once.

Avoid Long Locks on Large Tables

An ALTER TABLE that changes a column type can lock the entire table for minutes on a table with millions of rows. During that time, every read and write to that table waits. Users see timeouts. Queues build up.

The fix is to avoid single-step schema changes on large tables. Instead, use a multi-step approach:

Add a new column with the desired type.
Update rows in batches to populate the new column.
Add an index if needed.
Drop the old column in a later migration.

Each step locks briefly. The application can keep running between steps. This pattern is slower to write but much safer to run.

Keep Environment-Specific Values Out of Migration Files

A migration file should work the same way in development, staging, and production. If you hardcode a database name, a password, or a connection string into the SQL, the file becomes tied to one environment. You cannot run it elsewhere without editing it, and editing a migration file after it has been reviewed defeats the purpose of version control.

Use parameters, environment variables, or configuration that the migration tool provides. The SQL itself should only contain schema and data logic, not environment details.

Store Migrations Alongside Application Code

There are two common approaches: keep migration files in the same repository as the application code, or keep them in a separate repository. Both work, but the choice affects how teams coordinate changes.

When migrations live in the same repository, every pull request that changes the schema also includes the migration. The code review covers both the application change and the database change together. This makes it easier to spot mismatches, like a query that references a column that hasn't been added yet.

When migrations live in a separate repository, the application and database changes can evolve on different timelines. This is useful when multiple services share one database, but it requires more coordination to keep schema and code in sync.

Either way, the key is that migration files are version-controlled, reviewed, and traceable. A database change should leave the same kind of audit trail as a code change.

Practical Checklist for Writing Safe Migrations

Before you merge a migration into the pipeline, run through these checks:

Does every migration have a corresponding down migration?
Can the down migration actually restore the previous state without data loss?
Is the migration idempotent? Can it run twice without error?
Will the migration lock a large table for more than a few seconds?
Are environment-specific values absent from the SQL file?
Is the migration file stored in a repository with the application code or a dedicated database repository?

The Takeaway

Safe database migrations are not about avoiding change. They are about making change reversible, testable, and reviewable. Every migration file you write is a small contract: here is what changes, and here is how to undo it. When that contract is clear, the team can move faster because they know they have a way back.