Address schema for OSM

It’s lovely that OpenStreetMap has such a.. diverse schema. But it makes it hard to figure out how to encode stuff sometimes. I’m looking at importing a bunch of address data from other sources. What’s the right format?

The source data is diverse, but a common thing seems to be a Point mapped to a street name. Ie, the SF address database I downloaded says POINT(-122.408292 37.747785) 660 PRECITA AVE. Cool! There’s a lot of other data and ambiguity about exactly what that POINT encodes (street front? centroid of parcel? building?). But that’s a good place to start.

The OSM schema for addresses is reasonably well defined. Here are the most important tags, based on the Wiki pages for key:addr and Karlsruhe Schema

  • Parsed address: addr:housenumber, addr:street
  • Unparsed address: addr:full
  • Larger scale data: addr:city, addr:state, addr:country, addr:postcode

There’s additional complications for buildings with multiple addresses, and regions that don’t use city/state addressing (ie, addr:subdistrict), and interpolation, etc. Gonna punt on that and focus to what’s relevant for major US cities.

Another significant question is whether you tag the address data onto a building, or do you create a separate standalone address node. The latter approach seems necessary in the US where we don’t have building footprints for so many places. Although MapBox’s analysis is that the majority of addresses are on buildings.

Here’s a list of top address keys culled from TagInfo. I’ll note the majority of these are tags on nodes, not ways.

  • 26M: addr:housenumber
  • 23M: addr:street
  • 18M: addr:city
  • 14M: addr:postcode
  • 13M: addr:country

There’s a second group of keys that have hundreds of thousands of entries, but many fewer than those top 5 above

  • 1967k: addr:conscriptionnumber. (German addresses?)
  • 1605k: addr:interpolation
  • 842k: addr:state
  • 608k: addr:place
  • 500k-600k: addr:street:name, addr:street:type, addr:street:prefix
  • 463k: addr:streetnumber
  • 3896: addr:full
  • 280k-370k: addr:region, addr:district, addr:suburb

Here’s a bunch of maps of address data that people have made that are relevant. Some of these may not be for public consumption.

I’d like to double back and see what they do in Denmark. That country has very good geocoding data and an ongoing process to sync up with the official government data.