24-5 · Chapter 24 · 5 min read

When Your Database Schema Is Fine But Your Data Is Wrong

You just ran a database migration that added a new column. Everything looked good. The schema change succeeded, the column exists, and the application is

When Your Database Schema Is Fine But Your Data Is Wrong

You just ran a database migration that added a new column. Everything looked good. The schema change succeeded, the column exists, and the application is running. But then someone notices: all the users who registered three years ago should have been marked as verified, and they aren't. Or the phone numbers you migrated now have inconsistent formats because the old data didn't follow the new rules.

The schema is correct. The column type matches. The constraints are valid. The problem is the data itself.

This is a situation that feels worse than a failed migration. A failed migration is obvious. You see the error, you know something broke, and you can act. But a migration that succeeds with bad data is subtle. It can sit in production for hours or days before anyone notices. And when you do notice, the natural instinct is to panic and look for ways to undo everything.

But rolling back the schema to fix the data creates a new set of problems. The application might not work with the old schema anymore. You might lose data that was already correct. And you're undoing a structural change that was actually right, just to fix content that was wrong.

The Real Problem Is Not the Schema

When a migration adds a column like is_verified with a default value of false, the schema change is straightforward. The column exists, the default works, and new records will behave correctly. The problem is that existing users who should be verified are now marked as unverified. The schema didn't cause this. The migration logic didn't cause this. The gap was in understanding what the existing data should have been.

Another common example: a migration changes phone number storage to require country codes. The new format is correct, and new entries will follow the rules. But the old phone numbers that were stored without country codes are now inconsistent. The schema is fine. The data is not.

In both cases, the solution is not to revert the schema. The solution is to fix the data while keeping the schema intact.

Compensating Scripts: Fix Data Without Touching Structure

A compensating script is a migration that only changes data, not schema. It runs as a regular migration, goes through the same pipeline, and follows the same deployment process. But instead of ALTER TABLE, CREATE INDEX, or ADD COLUMN, it contains only UPDATE, INSERT, or DELETE statements that correct the data.

The goal is simple: bring the data to the correct state without changing the table structure.

Here is a practical example. After adding a currency column with a default value of 'IDR', the team realizes that all transactions from international partners should use 'USD'. The compensating script looks like this:

UPDATE transactions SET currency = 'USD' WHERE partner_type = 'international';

No schema changes. No new columns. No type conversions. Just a targeted data correction.

Compensating scripts also handle partial migration failures. Imagine a migration that creates a new table and moves data from an old one. Some rows fail to transfer because of a constraint violation. Instead of rolling back the entire migration and starting over, a compensating script can handle the remaining rows. It inserts or updates only the records that were missed, without re-running the full migration.

Make Your Compensating Scripts Idempotent

There is one rule that matters more than any other: compensating scripts must be idempotent. Running the script twice should produce the same result as running it once.

This is not a theoretical concern. In practice, migrations get re-run. A pipeline restarts. An environment gets refreshed. Someone runs the migration manually while debugging. If your script is not idempotent, running it twice can corrupt your data.

The fix is straightforward. Always check the current state before making changes. Use a WHERE clause that is specific enough to only affect the rows that need correction. If your database supports it, use ON CONFLICT clauses for inserts.

Instead of this:

UPDATE transactions SET currency = 'USD' WHERE partner_type = 'international';

Write this:

UPDATE transactions SET currency = 'USD' WHERE partner_type = 'international' AND currency IS DISTINCT FROM 'USD';

The difference is small but critical. The second version only updates rows where the currency is not already 'USD'. Running it a hundred times will only affect the rows that need changing, and only on the first run.

When Compensating Scripts Are Not Enough

Compensating scripts are not a universal solution. They work when the schema is correct and only the data needs fixing. If the schema itself is wrong, you still need a proper schema migration.

For example, if you added a column with the wrong data type, or if the column constraints are too strict for the data that needs to be stored, a compensating script cannot help. You need to alter the schema. Similarly, if a migration introduced a bug that corrupted data in a way that cannot be corrected with simple UPDATE statements, you might need a more complex approach.

But for the common case where the schema is right and the data is wrong, compensating scripts are safer than any alternative. They avoid the risks of down migrations, they do not require restoring from backups, and they can run while the application is still serving traffic.

A Quick Checklist for Writing Compensating Scripts

Confirm that the schema is correct before writing the script. If the structure needs to change, handle that first.
Write the script as a new migration, not as a hotfix applied directly to the database.
Make every statement idempotent. Check conditions before updating or inserting.
Test the script against a copy of production data, not just against an empty database.
Include logging or comments that explain why the data correction is needed, so future team members understand the context.

The Takeaway

When a migration goes wrong, the instinct is often to undo everything. But undoing the schema is a heavy operation that can break the application and lose correct data. Compensating scripts give you a lighter, more precise tool. They let you fix the data while keeping the schema that works. The next time you see a migration that succeeded but left behind bad data, ask yourself: is the schema wrong, or is the data wrong? If the answer is the data, write a compensating script. It is faster, safer, and less disruptive than rolling back.