Home-rolled database migrations

Confession time: I’ve never personally managed a whole project that involved a complex, changing database. I’ve certainly worked on projects with them, like Google AdWords, but someone else was in charge of the database management. With this LoL Slackbot I’m working on I’m the guy. Which means I’m the one who has to handle migrating a database when the code changes the schema.

Here’s a sample migration script I’ll discuss below.

Turns out changing database schemas on deploys is complicated. Who knew? It was a blessing in disguise that sqlite’s alter table is so limited, it forced me to take migrations seriously. And since I’m not using a tool like SQLAlchemy I’m writing migration scripts by hand. Here’s what I do:

  • The database has a table “globals” with a value “dbSchemaVersion” that I manually increment on changes.
  • My app has a single connect-to-DB function everything uses. It checks the schema version and refuses to run if it’s wrong.
  • When I write code that changes the schema somehow, I manually increment the version. This step is a bit error prone in that I could forget. I could automate it with a checksum on the output of “.schema”, but I won’t bother until the first time I forget.
  • I use sqldiff to summarize the schema changes.
  • When I write code with a new schema change, I keep manual notes on the migration I’ll eventually need. Then I just develop and test my code that uses the new schema in my dev environment ignoring the need for migration. My test scripts start with an empty database.
  • Once the feature development work is done I then write migration scripts. See a simple example in this gist.
  • run.sh is the actual migration. It validates the schema version is what’s expected, then runs arbitrary shell code. In this case all it does is run 3to4.sql. Sometimes I also have it run custom Python programs that do more complex data filling.
  • 3to4.sql is sqlite code for the migration. I do as much in SQL as possible. The .bail on pragma at the start is essential; see here for why. This example script adds a column and then fills it with a static value, very simple. Often the migration script has to copy some table over to a new table and then rename it, since alter table in sqlite3 is so limited.
  • test.sh is my test suite for the migration. I run this on a copy of the live production database (taken out of backup). It runs the migration, then diffs the schema from a dev database. Then it runs some ad hoc queries to validate the new database is as expected, which is totally hokey. I should make that a proper pass/fail test.

One observation I’ve had from this is “schema changes” are more than just changes to the shape of tables. They’re also changes to global values stored in the database, and sometimes changes to the meaning of the data in some column. This single incrementing version number captures all of that just fine, at least with a single developer.

Having now done this work I’m less excited than I used to be about autogenerated schema migrations. I mean, I should try one, but there’s something to be said for hand-crafting a migration when you have to change column values, rerun programs to fill empty data, etc. The automation around testing and version numbers is really valuable though.

I’m terrified with how ad hoc this is. There’s this temptation doing ops work to half-ass stuff, to just type some code in by hand and throw it out, or test it by visual inspection, or… Part of the devops mystique is treating the ops part as real development too, with real testing and repeatability. I’m about 3/4 of the way there with the migration stuff but not fully.

Timestamp precision

I’m in the habit of recording timestamps as integer seconds. Ie, I’m inserting into my database the result of int(time.time()) from Python. It’s a habit left over from the bad old days where Unix machines didn’t have sub-second clocks. Any modern system has access to at least a millisecond timer, if not a microsecond timer or better, and there’s no reason not to use it.

The only drawback for me with my project is I declared all my timestamp fields as “integer”, and sqlite3 does in fact round them off on storage. It’s harmless enough, at least if you’re not relying on exact timestamp storage, but I should probably do a schema migration at some point for it.

It’s a bit confounding that Javascript’s time precision is integer milliseconds, although it seems like a reasonable practical choice.


Success! lolslackbot

Big Success! My League of Legends Slackbot has had two major upgrades in the past two weeks. And both have gone quite well, with no serious bugs. I’m so, so glad I invested in a serious testing regimen. I’ve had automated tests for about a month and it is like night and day for development speed. Even on a small project, even working alone.

First, a quick description of the project. It’s a social monitor for League of Legends players. It watches the Riot API for games played by a select group of people. If someone plays a game, it reports the game result to Slack or Discord. A form of social presence. I built it first for a group of about 6 friends playing together and chatting via Slack. I’m now expanding it to a group of about 500 people on another Slack, which is what necessitated me evolving this from a quick hack to a real thing.

The first big change was 10 days ago. I added the notion of “groups”. Previously the list of players I was watching was hard-coded and I assumed everyone was interested in everyone. Now there’s Person and Group database tables and the game reporting is able to collect stats for all the people in a specific group. I still have a design flaw here I need to fix; currently doesn’t work if people from two different groups play the same game.

The second big change was today. I added message routing and delivery. Previously I had hardcoded a single Slack and Discord channel. Now I have Destinations, Subscriptions, and Messages tables so that the game results from various groups can be delivered to various different Slack and Discord channels. A lot of complexity, made doubly hard to test because the end result is actually writing messages people see on a live service. I ended up using unittest.mock for real to mock out the last bit of message delivery in various places.

I wish I’d thought to abstract out a Messages table earlier. Before I was just generating text messages from game objects, but then the game processing code got too deeply entwined with the message delivery code. Now I place create messages in the DB with one program and deliver them in a separate program. Much simpler.

Next major product feature I need to add is some sort of web interface for managing Groups, People, Subscriptions, etc. That sounds really tedious, but necessary. I think this is the step where I’ll graduate and use some higher level Python frameworks, maybe Flask for the webapp and SQLAlchemy to mediate database access. I don’t want to port all my old code over to SQLAlchemy just yet, but I would like to get my feet wet.

I should also productionize the python code. I’m not using virtualenv, my code isn’t even packaged in a proper module. Time to clean that up and add some logging, monitoring, etc.

May also be time to invest more in the test infrastructure. My automated tests are great but they rely on a test dataset I can’t really augment. I generated test data once and have no way to do it again. Also I’ve been running a separate test system for database migration scripts, much more manual and inspection by eye. I think that may be fine actually, but it’s worth examination.


Python is fast opening sqlite3

I’ve got a lot of code calling sqlite3.connect() in my program now. Particularly in my test suites. In MySQL and Postgres this can be slow and you end up needing connection pooling. I had a feeling it shouldn’t be in sqlite3 given its simplicity, but I wasn’t sure.

A bit of work with timeit later I can report it’s taking about 18usec to open a database. My connection function is more complex too, doing a select into a “globals” table to test the schema version. Even with all that work it’s 170usec, or < 2ms. That’s not super fast but it’s not a problem if my test suite does it 100 times.

$ python3 -m timeit -s 'import sqlite3' "sqlite3.connect('games.sqlite')"
100000 loops, best of 3: 17.6 usec per loop

On JSON and nothingness

Python’s JSON module serializes the Python value None as the JSON value null. This is correct. However it causes odd bugs if you are not aware of it.

Specifically, I had code like this:

db.execute('insert into table values(?)', json.dumps(foo.get("bar"))

I was trying to insert a database NULL if foo didn’t have the key “bar”. But foo.get() returns None, which gets turned into the string “null”, which results in the string ‘null’ being inserted into the column. Derp.

>>> json.dumps(None)
>>> json.loads('null') == None
>>> json.dumps('null')
>>> json.loads('"null"')

>>> json.loads(None)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.4/json/__init__.py", line 312, in loads
TypeError: the JSON object must be str, not 'NoneType'

Need a database abstraction

My little lolslackbot project has grown big. 11 different tables! I’m hand-writing ORM code now, silly things like “class Person” which wraps a row for the people table. The meat of the work is still hand-crafted SQL with proper relational queries, but I’m increasingly doing some really basic “get me the row with property X” and writing it all by hand is tedious. And oh boy I added some caching using functools.lru_cache, which I know from experience at Google is the beginning of the end for understandable correctness. And I continue to write increasingly elaborate schema migration scripts as I upgrade stuff.

It’s all kind of a mess. I wish now I’d started using some Python ORM tool. I still think ORMs are kind of evil, but they are a useful evil.

I’m out of date on where Python is but my impression is everyone likes SQLAlchemy. The basic tools are just a really nice upgrade from the basic Python database driver stuff. And the ORM looks like it’s made the right choices, at least based on what I read.

But do I want to port all this code over? Hell no. Hrm… I’m accumulating technical debt. I also need to wrap all my Python code in a proper module structure, and set up a real virtualenv, and…

At least I have a test suite. Although Guido help me if I ever have to regenerate the test dataset.


Discord API first use

My gaming group is switching from Slack to Discord. It’s a nice product. It’s sort of a Slack clone with voice channels as the big added feature. It’s also a product aimed at gamers, not corporate use, so there’s some hope the pricing and features will make more sense for gamers. I like the product, it’s worth a look.

So now I need to make my Slack bot post to Discord, too. Discord currently has no formal API. There is a fantastic group of hackers doing an Unofficial API project, though, and the company seems OK with it. Compared to most hacking projects this one is particularly well documented and organized. I dove in this morning and figured out how to write my own bot that works with the Discord API.

First, some resources:

The Python client is very nicely documented and organized. If you’re starting afresh you want to get version 0.10.0a, the alpha, directly from GitHub. PyPI currently has 0.9. 0.10 has reorganized the API, no reason to use the old one now.

The client itself is entirely based on Python 3.4’s asyncio (with preference for the new 3.5 syntax if available). That means you basically have to do everything through an event loop, which is a bit of a nuisance for simple things but absolutely necessary for building real software that’s going to interact with a Discord server. it also means discord.py only works with Python 3.4 and later. I think that’s all the right choice.

Here is my demo program. It took me a couple of hours in total, half of which was me reminding myself how asyncio works. All in all pretty straightforward, once you figure out the basic paradigm of how it fits together. There’s one choice embedded in my code here, which is I subclassed discord.Client instead of using the function decorator syntax for declaring event handlers. Either works fine.

The unofficial API also makes me understand the Discord product a little better. Unlike Slack, Discord has a notion of a single User that is logged into Discord with an email/password. That user can then be a member of various Servers, each of which has its own Channels. Slack doesn’t really have a cross-Slack notion of User, every Slack is an independent domain. The Discord API’s handling of public/private/DM channels is also a bit more straightforward than Slack’s.

quicky python continuous tests

pytest-watch is a fine quick “continuously run my tests” thing for Python in a tty. It’s nothing fancy, and the default UI is not perfect. But with -c and –no-beep and using –runner to run a custom command for your tests, it’s fine. Nice to not have to switch from my editor window to a console just to run the tests.

I’m envious of the NodeJS folks and their hipster test runners, they are very nice. Alternately I wish there were a good way to turn SublimeText into a bit more of an IDE with debugging and test output parsing. But after years of trying IDEs and going back to plain text editors, I think I’m not going to try to board that ship again.


sqldiff for sqlite3

Learned about a nice tool today: sqldiff. It shows differences between two sqlite databases. Particularly nice with the “–schema” flag, so you can just compare schema differences. I’m in the middle of database versioning hell as I keep iterating my lolslackbot project. Beginning to regret rolling all the database stuff by hand instead of using some ORM.

sqlite3 vs postgres: uneasy feeling

I continue to write my LoL Slack bot with sqlite3. I can live with the primary drawback of sqlite3, which is that writes require a database level lock (ie: no table or row locking). In normal use I should only ever have one writer to the database anyway. But I still have a nagging feeling I should use “a real database”. Ie: Postgres

The nervousness is I’m writing more complex code that’s wedded to sqlite. I’m using sqlite’s upsert equivalent, for which there is no standard syntax. I’m also using sqlite’s Python convenience feature of not using cursor objects. And just now I’m starting to pay attention to Python’s weird transactional semantics, which I suspect might be different in Postgres. In short my code has been dating sqlite for awhile but now things are getting serious and I’m getting commitment anxiety. Also Postgres has some alluring features I could use, like JSON and array types.

The underlying question is whether Postgres is really “better” for my kind of application. I suspect not, and part of why I stick with sqlite is to test that assumption, but the cost of being wrong is getting higher. The flip side is managing a Postgres daemon is still a PITA. Particularly on Ubuntu where upgrading is scary.