simpler database access in Python

I continue to look for things that make working with databases simpler in Python. I keep doing projects where I end up dropping down to raw SQL, with awkward syntax for querying rows and bespoke one-off code for updating data in the database. The Python is messy but I like being able to use SQL in its full glory.

I’ve tried using SQLAlchemy a couple of times and have a hard time loving it. ORMs are a bad idea in general and SQLAlchemy is very complicated and magic. For a schema with 20+ tables and a lot of complex interactions it makes sense, but for a small project where you just want something more sane than low level DBAPI…

For my current project I’m trying to use plain psycopg2 and its NamedTupleCursor. It’s not awesome, but at least it gives you named access to the rows from the DB. So it makes reading from the database a little nicer, but does nothing for writing. It’s Postgres only.

Along these lines is the records library. It also gives sane access to results from queries; either as dicts or named tuples. Result sets also have nice export functions like “write to CSV” or “make into a Pandas DataFrame”. It doesn’t really do much for writing data, just querying, so it’s simple and small.

dataset is an effort to do even more. I love their tag line: “Because managing databases in Python should be as simple as reading and writing JSON files.” Amen! There’s a lot more going on in dataset than records. It’s more like an ORM-lite and contains support for creating schema, automatic upserts, complex queries without Python, … It doesn’t have the export functions records has, but it does have a lot of thought for simple queries and updates. I’d have to actually use it for something to see if it’s sensible or if ORM-lite is an even worse idea than a full ORM.

Both records and dataset depend on SQLAlchemy, that turned me off at first. But they’re both mostly using it as a way to support multiple database types. In theory the Python DBAPI standard should be enough, but in practice it sucks enough I understand why they’d build on SQLAlchemy instead. With records you can mostly ignore that SQLAlchemy is there, but dataset exposes it more in the query system.

I wonder if some other programming environment has a much nicer database interface. The Microsoft world has LINQ, but that seems to have all the problems of other ORMs. I wonder if the R folks have a really nice way to work with databases, or maybe even Pandas.

I really like Python 3.6 format strings

Now that I have access to Python 3.6 I’m really enjoying the new PEP 498 format strings. Python has tried a bunch of different ways to format strings, from good ol’ %s notation to Template strings to str.format. They all seemed awkward to me because the thing you were substituting in was way far away from the string itself. Now you just embed a little python code right in the string to be formatted, maybe with a bit of presentation like how many digits to display. Done!

It’s pretty powerful under the hood. check out this example:

>>> today = datetime(year=2017, month=1, day=27)
>>> f"{today:%b %d, %Y}"  # using date format specifier
'January 27, 2017'

The format string is using the datetime-specific codes like %Y. Behind the scenes it’s calling the format() built-in which in turn can defer to the __format__() function on the object. In practice I suspect you can mostly not worry about this too much other than using basic floating point precision stuff.

I used to do this kind of thing a lot:

print ("{date} {asin} {title}".format(**locals()))

That’s gross for several reasons. Now I can just do

print(f"{date} {asin} {title}")

and be done with it.

GeoAlchemy and SQLAlchemy thoughts

I’m doing a new geo-project, a GPS tracker sort of thing. My database is very simple, just a Users table and a Locations table holding points. But I figured I’d front-load the work and use SQLAlchemy and Alembic and PostGIS and everything right now, not pay the pain of migrating later when I realize I need the fancier stuff.

Only SQLAlchemy doesn’t support PostGIS. GeoAlchemy2 does though. After an hour with it I can’t decide if it’s good or bad. It seems very simple.

In GeoAlchemy2 all geodata is represented in Python by just a few types; Geometry, Geography, Raster. You can’t do much with those things other than get the WKB or WKT for them. They’re not Points or Polygons or something, there’s no Python code for working with the shapes. You can pass them back to PostGIS though, to call functions like ST_Area to have PostGIS do a calculation. Requires a database round trip though. Alternately you can also convert them to Shapely shapes, use that excellent GIS library to do work in Python.

I guess that’s a reasonable compromise; GeoAlchemy2 is really just bridging to SQL, they’re not going to implement their own GEOS equivalent. It also has spatial functions declared in a SQLAlchemy-friendly way you so you can use SQLAlchemy syntax to do your work. That’s kinda neat.

Still I’m looking at a lot of third party Python code: SQLAlchemy, Shapely, GeoAlchemy, Alembic. All to manage a database with two tables and nothing more complex than a list of points. I may blow it all off and go to raw SQL. OTOH I may end up regretting that choice in a few months if I want to do more complex things.


Firefox Quantum

I’m trying Mozilla’s new web browser, Firefox Quantum (aka Firefox 57). It’s in beta now, will be the primary Firefox release in a few weeks.

There’s a lot going on with this release but the primary feature is speed. It is super snappy and responsive. From simple tricks like taking UI actions faster in response to input, to deep things like a brand new CSS engine (in Rust) and a new process model. Firefox had been single process up to now, unlike Chrome. Now it’s multi-process but still using threads in a performance / resource usage tradeoff. Uses less memory too although since I never keep many tabs open that’s less of a problem for me.

One drawback of the new release is old Firefox extensions no longer work; developers have to make significant changes for the new process model. Most of the Chrome extensions I care about also have Firefox versions that have already been updated. uBlock Origin, LastPass (a beta), Imagus, Pinboard+.

Been awhile since I used Firefox. It’s still good. Some customizations that helped me:

  • Disable Ctrl-Mousewheel zooming by going to about:config and setting mousewheel.with_control.action;0
  • Customize the layout by enabling “Drag Space” so you have a few pixels of window frame above the tabs.

python 3.6 on Ubuntu 16.04 LTS

I want to use python 3.6 on my Ubuntu 16.04 LTS box; they only have python3.5. It’s pretty simple following these instructions but here’s the details.

Step 1: install python 3.6 from Jonathon Fernyhough’s PPA.

add-apt-repository ppa:jonathonf/python-3.6
apt-get update
apt-get install python3.6 python3.6-venv python3.6-dev

Step 2: set up a 3.6 venv

python3.6 -m venv venv
source venv/bin/activate
pip install -U setuptools pip
python -V

Now when you activate venv, you’ll automatically get python 3.6 in that environment.

In addition to the jonathonf PPAs there’s also the deadsnakes PPA which has old versions of Python as well as new ones.

Leaflet heatmap options considered

I mentioned wanting a heatmap for some geographic data. Leaflet has a variety of heatmap plugins, I spent a couple of hours looking at them.

First, a word about heatmaps. A true heatmap has a feature where if you keep adding points (“heat”) to a spot, that heat diffuses and spreads out. In theory you could cover the whole world in hot red just adding to a single lat/lon because the heat diffuses (unless you have cooling/decay enabled). I want this feature. Calculating diffusion like this is complicated and slow. So most “heatmaps” cheat and instead of running a diffusion algorithm just draw blurry transparent dots. When they overlap it looks a lot like a heatmap, but you never get the full diffusion. It’s a reasonable compromise for a lot of applications (and way faster, particularly since it’s GPU-friendly) but it’s a compromise I’m hoping to avoid. See also this technical discussion.

Another key thing is how map zooming is handled. The heat blobs are really rendered in terms of pixels, not meters or geographic degrees, so the heatmap picture needs to be re-rendered at every zoom level. Some of the libraries look like they just render one image and raster scale it to match the zoom. The behavior I want is visible in this demo: notice how the blobs get more detailed when you zoom in.

Anyway, here’s my quick review of the 7 plugins:

  • Leaflet.heat; first one I’ve used. Canvas based, seems to work for 20,000 points no problem. Not a true heatmap, just fuzzy circles, but it looks good. Last update a year ago.
  • heatmap.js: canvas based, I think a true heatmap. Demo doesn’t recalculate on zoom the way I want and I don’t think there’s code for doing it. The library is mostly used for rendering user attention on web pages.
  • webgl-heatmap-leaflet. I gotta think WebGL is a good idea. New code, mostly 3 months old. Unfortunately it doesn’t seem to have the zooming behavior I want. There’s some scaling code (units and size option) but that seems to be about projecting circles of a consistent size at different latitudes.
  • Leaflet-solr-heatmap, see examples here. It’s pretty impressive being able to handle 10M points but the grid display isn’t what I’m looking for.
  • leaflet-div-heatmap is clever for using CSS and divs. No live demo I could find though, and no commits in 4 years, so I stopped looking.
  • MaskCanvas: not a heatmap, more of a “reveal the map” display. It’s neat though! Last real update was 2 years ago.
  • HeatCanvas: true heatmap, a bit too oriented towards static images. Not sure if it works well in zooming. Last real update was 4 years ago, I didn’t look further.

In summary: Leaflet.heat is the one I want. It’s the only library that recalculates on zoom. The downside is it’s not a true heatmap. I think that actually matters for my application (I expect a lot of dwell time in one place) but it’s a compromise I can accept.

Worth keeping an eye on mourner’s WebGL work for mapbox.js too; I’d rather just use vanilla Leaflet though. But his demo looks great.

Heatmaps for geopoints?

Trying to figure out how to generate actual heatmaps for a bunch of geopoints. The usual hack is to draw blurry dots at 10% opacity and let them overlap. I want the real deal. The Sethoscope tool works in Python, but is pretty slow, see example below. Mourner at Mapbox is working on a WebGL thing.

The hard part here is heatmaps aren’t scale invariant; you need to render it differently depending on the scale the user is zoomed too. Realistically that means it needs to be in browser Javascript, and therefore fast. Maybe GL is necessary.