Nelson's log

Database IDs should not be integers

It always bugged me that standard databases use integers as IDs. Not just databases: all sorts of systems, anything with a notion of identity ends up with an “id” token, and the type of that token is almost always “integer”.

The problem is that “integer” has a lot of baggage. An integer is something you can do arithmetic on. It makes no sense to divide one id by another, or add two ids together, or do almost any other arithmetic operation on. The one exception is adding one to an id, as in an autoincrement column for the primary key. But even that’s really a mistake a lot of the time, because those IDs have a way of leaking to the public in URLs and JSON blobs. I can’t tell you how many sites you can estimate growth on by just looking at message IDs or customer IDs or whatever.

My ideal world would be to have a separate “id” type in SQL which has very few operations defined on it. That’s boiling the ocean though. A close second is just to use “text” or maybe “blob” as the ID type. But in practice that doesn’t really work very well. It ends up using more storage even if it shouldn’t have to. And I have to believe indexing and searching text columns is in practice more expensive than integer columns.

So integer it is. Why fight it?