22-2 · Chapter 22 · 5 min read

Writing Database Migrations That Won't Break When Run Twice

You're deploying a new feature that needs an extra column in the users table. You write the migration script, run it in staging, everything looks good

Writing Database Migrations That Won't Break When Run Twice

You're deploying a new feature that needs an extra column in the users table. You write the migration script, run it in staging, everything looks good. Then someone on the team runs the same script again by accident during a pipeline retry. Now you have an error: "column already exists." The deployment fails, someone has to manually fix the database, and the release gets delayed.

This scenario plays out more often than most teams admit. The fix isn't complicated, but it requires a shift in how you think about migration scripts. You need to write them so they can be run multiple times without causing problems.

What Makes a Migration Safe

The core principle is idempotency. An operation is idempotent when running it once or a hundred times produces the same final state. This is not about preventing the script from running twice. It's about making sure that running it twice does no harm.

Think about why a script might run more than once. A pipeline fails halfway through and gets retried. Someone runs the migration on staging, forgets about it, and runs it again later. Two developers apply the same change in different environments at different times. Without idempotency, any of these scenarios can corrupt your data or break your deployment.

The Simple Pattern: Check Before You Act

The most straightforward way to make a migration idempotent is to check whether the change already exists before applying it. SQL databases make this easy with conditional statements.

Here's a concrete example. Suppose you need to add a last_login_at column to track user activity:

-- Non-idempotent: fails if column already exists
ALTER TABLE users ADD COLUMN last_login_at TIMESTAMP;

-- Idempotent: succeeds whether column exists or not
ALTER TABLE users ADD COLUMN IF NOT EXISTS last_login_at TIMESTAMP;

The first version will throw an error if the column already exists. The second version runs safely every time.

Instead of writing:

ALTER TABLE users ADD COLUMN phone_number VARCHAR(20);

Write:

ALTER TABLE users ADD COLUMN IF NOT EXISTS phone_number VARCHAR(20);

The IF NOT EXISTS clause means the script succeeds whether the column is already there or not. The same approach works for indexes, constraints, and new tables. PostgreSQL, MySQL, and most modern databases support these conditional constructs.

For removing things, the same logic applies. Dropping a column that doesn't exist will fail. Use IF EXISTS to make the operation safe:

ALTER TABLE users DROP COLUMN IF EXISTS old_phone_number;

Handling Data Migrations

Column additions and removals are the easy cases. Real complexity comes when you need to move or transform existing data. Suppose you're splitting a full_name column into first_name and last_name. The migration needs to:

Add the two new columns
Populate them from the existing data
Handle the case where the script runs again

A naive approach would copy data without checking if it's already been copied. Running that twice creates duplicate data or overwrites valid values. A safer pattern looks like this:

-- Add columns if they don't exist
ALTER TABLE users ADD COLUMN IF NOT EXISTS first_name VARCHAR(100);
ALTER TABLE users ADD COLUMN IF NOT EXISTS last_name VARCHAR(100);

-- Only populate if the target columns are empty
UPDATE users 
SET first_name = SPLIT_PART(full_name, ' ', 1),
    last_name = SPLIT_PART(full_name, ' ', 2)
WHERE first_name IS NULL AND last_name IS NULL;

The WHERE clause ensures the update only runs on rows that haven't been processed yet. If the script runs again, those rows are already populated and the update does nothing.

For more complex scenarios, you might need to compare source and target data, or use a checksum to verify that the transformation produced the correct result. The principle stays the same: never assume the data is in its original state.

Gradual Deletions Reduce Risk

Deleting columns or tables is risky because you can't undo it easily. A safer approach is to do it in stages:

First migration: rename the column to column_name_deprecated
Wait for a few release cycles to make sure nothing breaks
Second migration: drop the deprecated column

This pattern gives your team time to catch any code that still references the old column. If something goes wrong, renaming is reversible. Dropping is not.

Keep a Record of What Ran

Idempotency handles the case where a script runs multiple times. But you also need to know which scripts have run, when they ran, and whether they succeeded. This is your audit trail.

Most migration frameworks like Flyway or Liquibase handle this automatically. They create a table that tracks every migration script by name, checksum, and execution timestamp. If you're writing raw SQL scripts without a framework, create your own tracking table:

CREATE TABLE IF NOT EXISTS migration_log (
    script_name VARCHAR(255) PRIMARY KEY,
    started_at TIMESTAMP,
    completed_at TIMESTAMP,
    status VARCHAR(20),
    script_hash VARCHAR(64)
);

Before running any migration, insert a row with the script name and a "running" status. After completion, update the status to "success" or "failed." If the script fails, the pipeline can retry, and the migration runner can check whether the script already completed successfully.

This log is not just for debugging. It's your evidence for compliance and governance. When someone asks "who changed the users table and when," the answer should come from the log, not from someone's memory.

Practical Checklist for Writing Migration Scripts

Before you run any migration in production, verify these points:

Can the script run twice without error? Test it by running it twice in a row on a copy of your database.
Does the script check for existing columns, indexes, or constraints before creating them?
For data transformations, does the script handle already-transformed rows safely?
Are deletions done in stages, with a deprecation period before removal?
Is there a log entry for every script execution, including timestamps and status?
Can you roll back the change if something goes wrong? If not, do you have a plan?

The Takeaway

A migration script that fails when run twice is not a migration script. It's a time bomb waiting for someone to retry a pipeline. Write every migration as if it will be executed multiple times, because in practice, it probably will. Check before you create, verify before you transform, and log everything. Your future self, debugging a deployment at 2 AM, will thank you.