When Database Migrations Break Running Applications
Your team has just deployed a new feature. The rollout looks clean. But five minutes later, the on-call engineer sends a screenshot of error logs. Old application instances are crashing with database errors. The query SELECT * FROM users WHERE status = 'active' is suddenly failing. What happened?
You changed the status column from VARCHAR to INT in the migration. The new application code handles integers just fine. But during a rolling update, both old and new application instances run side by side. The old instances still expect strings. The database schema changed underneath them, and they broke.
This is the core tension of database migrations in modern deployments: the database is shared, but application versions are not.
The Shared Database Problem
When you deploy with rolling updates, blue-green deployments, or canary releases, multiple versions of your application run simultaneously. They all connect to the same database. But each version has different expectations about the schema.
The old application expects certain columns, data types, and constraints. The new application expects a slightly different structure. Both need to work correctly during the transition period. If your migration breaks compatibility with the old application, you get production errors.
This is not a theoretical problem. It happens every time a migration changes something that running code depends on.
Backward Compatibility: The Non-Negotiable Rule
The fundamental rule is simple: every migration must be backward-compatible with the old application. The old code must be able to read and write data without errors, even after the migration runs.
Consider these two SQL migrations to see the difference:
-- Safe: Add a nullable column with a default value
-- Old app can still INSERT without specifying phone_number
ALTER TABLE users ADD COLUMN phone_number VARCHAR(20) DEFAULT NULL;
-- Breaking: Change column type from VARCHAR to INT
-- Old app's SELECT * FROM users WHERE status = 'active' will fail
-- because 'active' is a string, not an integer
ALTER TABLE users ALTER COLUMN status TYPE INT USING status::integer;
Some changes are naturally backward-compatible. Adding a nullable column with a default value, for example, does not break existing queries. The old application's INSERT INTO users (name, email) still works because the new phone column accepts nulls.
Other changes break compatibility immediately. Changing a column type, renaming a column, adding a NOT NULL constraint to a populated column, or adding a foreign key that existing data cannot satisfy will cause errors in old application instances.
The rule is not optional. If you cannot guarantee backward compatibility, you cannot safely deploy with zero downtime.
The Expand-Contract Pattern
The safest way to handle breaking changes is the expand-contract pattern, sometimes called dual-write. The idea is to make changes in phases, never removing old structures until all application instances have been updated.
The following sequence diagram illustrates the timeline of the expand-contract pattern, showing how old and new application instances interact with the database during each phase.
Phase 1: Expand. Add the new structure without removing the old one. If you want to replace status (VARCHAR) with status_id (INT), add the new column while keeping the old one. The new application writes to both columns. The old application continues using status. Both work.
Phase 2: Migrate data. Backfill the new column with converted values from the old column. This can run as a background job or a separate migration step.
Phase 3: Update application code. Deploy the new application version to all instances. Now every running instance knows about both columns.
Phase 4: Contract. In a separate deployment, remove the old column. By this point, no running application depends on it.
The pattern adds complexity. Your application code needs to handle dual-write logic during the transition. You have extra columns to maintain temporarily. But this is the price of avoiding downtime and errors during deployment.
Forward Compatibility: The Other Direction
Backward compatibility protects old application code. Forward compatibility protects new application code when the database has not fully migrated yet.
Consider a scenario where you deploy the new application first, but the migration has not run on all database replicas. The new code needs to handle both old and new schema formats. If it reads status as VARCHAR but expects INT, it should handle the conversion gracefully.
Forward compatibility is harder to achieve and usually has limits. It means your new code must be defensive about the data it reads. It should not assume the schema has already changed. This often means adding fallback logic or data conversion in the application layer until the migration completes.
Beyond Columns: Indexes, Constraints, and Foreign Keys
Compatibility is not just about columns and data types. Indexes, constraints, and foreign keys can also break running applications.
Adding a new foreign key constraint can cause existing INSERT or UPDATE queries to fail if the referenced data does not exist. Adding a UNIQUE constraint to a column that previously allowed duplicates will break any query that tries to insert duplicate values. Even adding an index can cause performance issues if the database locks the table during index creation.
Every schema change needs to be evaluated for its impact on running application code. Ask yourself: will this change cause any query from the old application to fail? Will it change behavior in ways the old code does not expect?
Practical Checklist for Safe Migrations
Before running a migration in production, verify these points:
- Can the old application read all existing data without errors after the migration?
- Can the old application write new data without errors after the migration?
- Are all new columns nullable or have default values?
- Do new constraints already hold true for existing data?
- Will the migration cause table locks that block queries?
- Is there a rollback plan if something goes wrong?
The Takeaway
Database migrations during zero-downtime deployments require treating the schema as a shared interface between application versions. Every migration must be backward-compatible with the old code. Breaking changes need the expand-contract pattern, adding new structures first and removing old ones only after all instances have been updated.
The database is the single source of truth that all application versions share. If you change it carelessly, you break running code. Design your migrations like bridges that both old and new applications can cross safely. Only after every instance has moved to the new side should you modify the bridge itself.