Backup Is Your Safety Net, Not Your Migration Strategy
Your team just ran a database migration that dropped the orders_archive table. The data had been moved to a new table, so everything looked clean. But an old system nobody remembered was still reading from that table. Now the table is gone, the data is gone, and the new migration to recreate it has nothing to restore.
Or maybe you changed a price column from INTEGER to DECIMAL, and a batch job that runs every hour is still writing integers into that column, corrupting records silently.
In moments like these, the instinct is understandable: "We have backups. Let's just restore the database to before the migration."
That sounds reasonable. But the consequences are often worse than the original problem.
What Backups Are Actually For
Backups exist for catastrophic scenarios. A server room fire. A storage array failure. A corrupted disk. An admin accidentally dropping a critical table. In those situations, restoring from backup is the right call. You have no other option, and the cost of not restoring is total data loss.
But a failed migration is not a catastrophe. It is a routine operational event. And treating it like a disaster creates new problems.
The Two Problems With Backup-Based Rollback
Data Loss
A backup is a snapshot of your data at a specific point in time. If you took the backup at 2:00 AM and the migration fails at 10:00 AM, you have eight hours of user activity that exists only in your production database. Orders placed. Profiles updated. Support tickets created. Configuration changes made.
When you restore the 2:00 AM backup, all that data disappears. There is no way to get it back unless you have another source. This gap grows larger the longer the time between backup and restore. Even if you use point-in-time recovery and restore to 9:59 AM, you still lose whatever happened in that last minute.
Downtime
Restoring a database is not instant. For databases in the hundreds of gigabytes range, restore can take hours. During that time, your application cannot serve traffic normally. The data is inconsistent, partially loaded, or locked by the restore process.
Compare that to a roll-forward approach, where you run a new migration that takes minutes. Or a compensating script that adjusts data without stopping the application. Backup-based rollback forces your team to choose between data loss and extended downtime, often both.
Point-in-Time Recovery Reduces But Does Not Solve
Some teams use point-in-time recovery (PITR) to reduce the data gap. Instead of restoring to the last full backup, they restore to a specific transaction log position, like one minute before the migration ran.
This helps. You lose less data. But you still lose some. Transactions that completed in the seconds before migration are gone. And the restore itself still takes time. PITR is better than a full backup restore, but it is not a clean rollback mechanism.
Where Backup Belongs in Your Recovery Hierarchy
Backup has a clear role in database recovery: it is the last resort. The safety net for when everything else has failed and you are willing to accept data loss and downtime because the alternative is worse.
In a mature team, backups are taken regularly and tested periodically. But when a migration goes wrong, the team does not reach for the backup first. They work through a hierarchy:
The following flowchart summarizes this decision process:
- Roll-forward: Write a new migration that reverses the change or adds the missing piece.
- Compensating script: Run a script that adjusts data without a full migration.
- Backup restore: Only after evaluating the data loss and downtime trade-offs.
A Practical Checklist Before Reaching for Backup
Before you restore from backup, ask these questions:
- Can we write a migration that adds back the removed table or column?
- Can we run a compensating script that reconstructs the lost data from logs or other sources?
- How much data will we lose if we restore from the last backup?
- How long will the restore take, and can the business accept that downtime?
- Is there a way to keep the application partially running while we fix the data?
If the answer to the first two questions is "yes," you do not need a backup restore. If the answer to the last three questions is "unacceptable," you need to find another way.
The Takeaway
Backup is your safety net for disasters, not your migration rollback strategy. When a migration fails, start with roll-forward or compensating scripts. Reach for backup only when those options are exhausted and you have accepted the cost of data loss and downtime. A team that treats backup as the default rollback plan is a team that will lose production data and sleep.