SQLAlchemy thoughts

On the heels of my post about simple persistence I decided to give the SQLAlchemy ORM a try. I’ve spent a few hours with it and liked it. Here’s some notes on what I learned.

I’m not a huge fan of ORMs; they’re complicated and it’s very easy to write code that is either broken or has huge performance problems. Relational databases are not object systems, trying to mix the two models is always problematic. OTOH an ORM can be convenient.

So far I like SQLAlchemy. I particularly like its idea that it is there to help with your database, not enforce structure on your database. It’s quite flexible and can be adapted to most (all?) schemas. I also like that it’s relatively straightforward. For instance there’s little caching of objects, pretty much any call that might require a database query will just go ahead and do that query. That makes for potentially inefficient code, but then it also makes it clear and unmagic and avoids a lot of hard-to-find bugs. One can always add caching but it’s up to you to manage it.

I also particularly like the excellent SQL tracing; just add echo=True to the session and you get a nice clean log of the SQL being executed. Makes it easy to understand what is happening and also think about performance. See also preventing lazy loading.

What I don’t like is the Python code magic in SQLAlchemy. The way you create SQL queries with Python operators like == is built out of some operator overloading that is very strange. Also the model classes you create (one per table) have a huge amount of magic in them so that when you reference something simple like User.name what’s really going on is a bunch of code with side effects, including possible SQL queries. When it all works it makes the code look simple, but in my experience this kind of hidden stuff can get you in trouble. But then again ORMs are all about doing ill-advised things marrying object oriented code to relational data, so might as well use some magic to make it look nicer.

Testing

I keep hoping to find an ORM that has first class support for mock objects for unit testing. So I can write all my higher level tests without any database, just mock objects generated by the ORM itself. No one seems to do this though, either because it’s hard or because it’s a bad idea to fake out such a key component of your software. So instead everyone uses a special test database and some combination of test fixtures and transaction rollbacks to manage it.

A lot of SQLAlchemy testing examples use pytest, in part because its fixture support is so good. So I’m using pytest for the first time and I like it.

  • sqlite’s in-memory databases are great for a fast test database. But only use this if you intend to deploy to sqlite; you must test against the same database you are using in production.
  • The asphalt project has testing patterns that I ended up cribbing from. Right now I’ve got it setting up a transaction for every single test whether it uses a dbsession or not, I should probably fix that. Note the listener for after_rollback events to handle application rollbacks, no other example I’ve seen is doing that.
  • More ideas of fixture patterns: this blog post, this book chapter. The latter also has an idea on mocking out the database connection itself to return fake formatted data instead of using a test database.
  • pytest-sqlalchemy is a resuable set of fixtures. It looks OK but pretty simple and the project is inactive. (Maybe it doesn’t need activity!)
  • factory_boy is a test fixture library with specific support for ORMs including SQLAlchemy. Alternative to pytest fixtures, I guess.

Alembic schema migrations

SQLAlchemy does not come with any support for schema migrations. Alembic seems to be the consensus choice for migrations. It’s only about half-automated,  you’re expected to be reviewing and editing scripts. Which seems reasonable enough. One nice thing about Alembic is you can bring it in on a SQLAlchemy project that already exists, you don’t have to start from the beginning with it.

A few notes:

  • If your code isn’t in a Python package and just has simple “import model” style imports, you can fix up Alembic’s import by adding your source directory to PYTHONPATH. Or see the discussion for editing env.py
  • When you first set up Alembic you probably already have a schema in place. I decided to generate an empty migration to mark this existing schema. You still have to apply this migration so Alembic know’s what’s going on, by running “alembic upgrade head”.
  • There’s a cookbook for having Alembic history going back to an empty database.
  • sqlite is a PITA for migrations because it doesn’t have much support for alter table. But Alembic does have nice “batch” support for “move-and-copy” migrations.