Home-rolled database migrations

Confession time: I’ve never personally managed a whole project that involved a complex, changing database. I’ve certainly worked on projects with them, like Google AdWords, but someone else was in charge of the database management. With this LoL Slackbot I’m working on I’m the guy. Which means I’m the one who has to handle migrating a database when the code changes the schema.

Here’s a sample migration script I’ll discuss below.

Turns out changing database schemas on deploys is complicated. Who knew? It was a blessing in disguise that sqlite’s alter table is so limited, it forced me to take migrations seriously. And since I’m not using a tool like SQLAlchemy I’m writing migration scripts by hand. Here’s what I do:

  • The database has a table “globals” with a value “dbSchemaVersion” that I manually increment on changes.
  • My app has a single connect-to-DB function everything uses. It checks the schema version and refuses to run if it’s wrong.
  • When I write code that changes the schema somehow, I manually increment the version. This step is a bit error prone in that I could forget. I could automate it with a checksum on the output of “.schema”, but I won’t bother until the first time I forget.
  • I use sqldiff to summarize the schema changes.
  • When I write code with a new schema change, I keep manual notes on the migration I’ll eventually need. Then I just develop and test my code that uses the new schema in my dev environment ignoring the need for migration. My test scripts start with an empty database.
  • Once the feature development work is done I then write migration scripts. See a simple example in this gist.
  • run.sh is the actual migration. It validates the schema version is what’s expected, then runs arbitrary shell code. In this case all it does is run 3to4.sql. Sometimes I also have it run custom Python programs that do more complex data filling.
  • 3to4.sql is sqlite code for the migration. I do as much in SQL as possible. The .bail on pragma at the start is essential; see here for why. This example script adds a column and then fills it with a static value, very simple. Often the migration script has to copy some table over to a new table and then rename it, since alter table in sqlite3 is so limited.
  • test.sh is my test suite for the migration. I run this on a copy of the live production database (taken out of backup). It runs the migration, then diffs the schema from a dev database. Then it runs some ad hoc queries to validate the new database is as expected, which is totally hokey. I should make that a proper pass/fail test.

One observation I’ve had from this is “schema changes” are more than just changes to the shape of tables. They’re also changes to global values stored in the database, and sometimes changes to the meaning of the data in some column. This single incrementing version number captures all of that just fine, at least with a single developer.

Having now done this work I’m less excited than I used to be about autogenerated schema migrations. I mean, I should try one, but there’s something to be said for hand-crafting a migration when you have to change column values, rerun programs to fill empty data, etc. The automation around testing and version numbers is really valuable though.

I’m terrified with how ad hoc this is. There’s this temptation doing ops work to half-ass stuff, to just type some code in by hand and throw it out, or test it by visual inspection, or… Part of the devops mystique is treating the ops part as real development too, with real testing and repeatability. I’m about 3/4 of the way there with the migration stuff but not fully.