Data Reconciliation: Proving Your Migration Worked Correctly
You just finished a data migration. The script ran without errors. The logs look clean. The team is ready to move on. But deep down, you have that nagging feeling: did everything actually work? Did every row make it? Did any values get silently corrupted?
This is the moment where most teams realize that a successful migration is not the same as a correct migration. A script can finish with exit code zero and still produce wrong data. A filter might exclude rows it should have included. A type conversion might truncate values without throwing an error. The database will not tell you about these problems unless you explicitly ask.
That is why reconciliation exists. It is the process of comparing data before and after a migration to prove that nothing was lost, changed, or corrupted. It is the final checkpoint before you declare the migration done.
Why Error-Free Scripts Are Not Enough
The uncomfortable truth about data migration is that correctness and error-free execution are two different things. A migration script can run perfectly from a technical standpoint and still produce incorrect results.
Consider a common scenario: you migrate a table with a WHERE clause that filters out inactive users. The script runs, the rows insert successfully, and the old table gets deprecated. Weeks later, someone notices that a group of active users is missing. The filter was too aggressive, or the condition was wrong. The script never failed, but the data is wrong.
Errors in logs do not capture this kind of problem. The database engine does not know that a missing row is a mistake. It only knows that the INSERT statement executed successfully. The only way to catch these silent failures is to compare the source data with the target data directly.
The Practical Approach to Reconciliation
Reconciliation does not need to be complicated. The simplest and most effective method is checksum comparison. Instead of checksumming files, you checksum batches of data.
Here is how it works for a table migration:
The following flowchart illustrates the batch-by-batch reconciliation process:
Here is a concrete SQL query that implements this approach for two tables:
-- Compare row counts and checksums between source and target tables
WITH source_checksums AS (
SELECT
COUNT(*) AS row_count,
MD5(STRING_AGG(CAST(column1 AS TEXT) || '|' || CAST(column2 AS TEXT) || '|' || CAST(column3 AS TEXT), ',' ORDER BY id)) AS batch_hash
FROM source_table
),
target_checksums AS (
SELECT
COUNT(*) AS row_count,
MD5(STRING_AGG(CAST(column1 AS TEXT) || '|' || CAST(column2 AS TEXT) || '|' || CAST(column3 AS TEXT), ',' ORDER BY id)) AS batch_hash
FROM target_table
)
SELECT
'Row count mismatch' AS issue
FROM source_checksums, target_checksums
WHERE source_checksums.row_count <> target_checksums.row_count
UNION ALL
SELECT
'Checksum mismatch' AS issue
FROM source_checksums, target_checksums
WHERE source_checksums.batch_hash <> target_checksums.batch_hash;
This query computes a single checksum over all rows in each table (using a stable ordering) and compares both the row count and the hash. If either differs, the query returns a clear indication of what went wrong.
- Read a batch of rows from the source table.
- Compute a hash of the entire batch (for example, using MD5 or SHA256 over a concatenation of all column values).
- Read the same batch from the target table using the same ordering and compute the same hash.
- Compare the two hashes.
If the hashes match, the batch is identical. If they do not, you know exactly which batch has a problem, and you can investigate that specific range of rows.
For large tables, processing in batches is essential. You do not want to load millions of rows into memory at once. A batch size of 1,000 to 10,000 rows works well for most databases. You can run these comparisons in parallel across multiple batches to speed things up.
What Else to Check Beyond Checksums
Checksums catch most problems, but they are not the only thing you should verify. A few additional checks add confidence without much extra effort.
Row count. This is the simplest check. The number of rows in the target table should match the number of rows in the source table. If the counts differ, something went wrong with the migration logic.
Null values. Migrations sometimes change null columns to default values or vice versa. Compare the count of nulls per column between source and target. A mismatch here often indicates a type conversion issue or a default value constraint that was applied incorrectly.
Value distribution. Pick a few important columns and compare their value distributions. For example, if the source table has 1,000 users with status "active" and 500 with status "inactive," the target table should have the same numbers. A significant difference suggests the migration filter or transformation logic has a bug.
Edge cases. Test specific rows that are known to be tricky: rows with special characters, very long strings, dates near boundaries, or negative numbers. If your migration handled these correctly, it is a good sign that the general logic is sound.
Making Reconciliation Part of Your Pipeline
Reconciliation should not be a one-off manual task that someone remembers to run after a late-night migration. It should be automated and integrated into your deployment pipeline.
Write a reconciliation script that runs after the migration and backfill steps complete. The script should:
- Connect to both the source and target databases.
- Run the batch checksum comparison.
- Check row counts, null counts, and value distributions.
- Generate a detailed report of any mismatches.
- Send a notification (email, Slack, or whatever your team uses) with the results.
If the reconciliation passes, the pipeline can proceed to the next step. If it fails, the pipeline should stop and alert the team. This prevents incorrect data from reaching production without anyone noticing.
Automating reconciliation also makes it repeatable. Every migration goes through the same verification process. You do not have to rely on someone remembering to run a script or checking the right things. The pipeline enforces it.
What Reconciliation Is Not
Reconciliation is not a replacement for dry runs or backfill strategies. Each step serves a different purpose.
- Dry runs verify that the migration logic works without affecting production data.
- Backfill handles the actual data transfer in manageable chunks to minimize impact.
- Reconciliation proves that the backfill produced correct results.
Think of reconciliation as the final quality gate. It confirms that all the previous steps worked as intended. If reconciliation passes, you can be confident that the data is ready. If it fails, you go back, fix the issue, and rerun the migration from the start.
A Practical Reconciliation Checklist
When you set up reconciliation for your next migration, here is a short checklist to guide you:
- Checksum comparison per batch (1,000-10,000 rows per batch)
- Row count match between source and target
- Null count match per column
- Value distribution match for key columns
- Edge case verification (special characters, boundary values)
- Automated script integrated into the pipeline
- Notification on failure with detailed mismatch report
The Takeaway
A migration is not complete until you have proven the data is correct. Error-free execution is not enough. Reconciliation gives you that proof by comparing source and target data directly. Automate it, run it every time, and treat a failed reconciliation the same way you treat a failed test: stop, investigate, and fix before moving forward.