Our team uses Drizzle as our ORM and SQL builder for interacting with the PostgreSQL database in our application. Drizzle is an amazing tool, with some really great type safety and migration tools.
Over the course of building our application over the last 6 months, however, we’ve built up quite a few migrations. We were at 124 migrations to run, and the SQL schema has changed drastically from when we started.
This means that when we need to boot up a fresh container, either for CI/CD purposes (such as in our E2E tests) or manual QA testing, Drizzle has to go through 124 migration scripts and apply them all, even though in the course of the application’s development a good half of the schema created is immediately overwritten by features that had evolved over time.
Trying to keep the same set of migration files from day 0’s “initial commit” can’t continue forever, as it will only increase application startup times for new deployments (and to some extent, file read and processing times for migrations already ran, although I suspect Drizzle uses some form of file hashing to reduce unnecessary reads).
So the goal was to reset the migrations back to 001, with the new migration SQL file containing the total schema for the most recent state of the application. Obviously there are some important caveats with doing this safely. In order to do this, I had to make some assumptions:
- Live production and sandbox deployments are running on the most recent version of the schema
- Local developer environments are also running on the most recent version of the schema, OR a reset will be necessary in those environments
- (workaround for developers would be to checkout a previous git commit and run any missing migrations)
- There aren’t any active PRs adding to the schema that would conflict or be merged in while this one is being worked on
Step 1: Matching the Existing Schema
To reset to 001, I needed the SQL script generated by Drizzle to match the current schema. If you’ve never created any custom SQL migrations with Drizzle, or modified the ones generated by Drizzle, this isn’t an issue. However, in our applications, we had many different areas where we had to modify or create custom migrations. For example, when having to transform data as part of a migration or new required column, and in one case, to create inherited Postgres tables (something that Drizzle doesn’t natively support).
Scanning through 124 migration files and trying to find places where we deviated from Drizzle’s generated SQL sounded like a daunting task with a lot of risk – the last thing we want is for our production database schema to diverge from development environments.
Rather than reading through the SQL scripts to find these custom changes, I found a much simpler solution – I deleted all the current migrations, generated a new 001 migration, applied it to a new database, then used the schema diff tool in the pgAdmin client to compare it to the database with the most recent schema.
The schema diff tool will show you every difference in schema between two databases, and even produces SQL scripts that can bring schema B up to alignment with schema A. Using this tool, I modified the new 001 SQL migration to match the current schema state, before running one final schema diff check with pgAdmin to ensure I didn’t miss anything.
(side note: this is also a great way to find things in the schema that don’t need to be there anymore. Typically when removing a table from Drizzle schema in the codebase, we overrode the migration so that it didn’t drop the table.)
Step 2: Stopping Drizzle from Re-Applying the new 001 Migration
Now that I had a 001 migration that matched the current application’s schema, complete with custom changes, I needed to make it compatible with existing deployments that were already up to date.
While Drizzle migrations are automatically incrementing, the name of the migration is not used to determine if a migration has already been applied. If that were the case, I would be able to simply delete everything before migration 124, and overwrite migration 124’s SQL with the new schema creation script generated as 001.
Drizzle uses a hash of the journal file, a JSON file that represents the state of the schema for each migration (or, put another way, it’s a JSON version of the JavaScript table/schema definitions). When Drizzle encounters a migration who’s journal file has a hash that doesn’t exist in it’s migration history table, it applies the migration.
In order to “trick” Drizzle into thinking that the 001 migration has already been applied in existing environments, I added a SQL script that runs in our migrate.ts
file, which checks the following:
- Does the Drizzle migration history table exist (__drizzle_migrations)?
- Does the current database contain one of the tables that’s core to our application (as a way to detect in migrations have already been applied – not super clean but gets the job done)
- Has the 001 migration not been applied yet?
If the three are true, it manually adds the 001 migration record into the history table as already applied. By doing this right before Drizzle’s migrate script is run, it allows brand new environments to have the 001 migration applied, but existing environments won’t error due to the schema already existing.
Conclusion
Collapsing down migrations back to 001 allows a quicker startup time for new environments and E2E tests, and staves off what I suspect may be a hard 999 limit (based off the numbering they use).
The primary goal, though, was to make it PGLite compatible. Sometime in the last few months, Drizzle added support for migration breakpoints in SQL scripts (in the form of comments), which makes some modifications to how SQL runs in PGLite vs Postgres due to PGLite limitations.
These statement breakpoint comments are added to new migrations by default, but all of our old migrations were missing them, making them incompatible with PGLite.