It’s lovely that OpenStreetMap has such a.. diverse schema. But it makes it hard to figure out how to encode stuff sometimes. I’m looking at importing a bunch of address data from other sources. What’s the right format?
The source data is diverse, but a common thing seems to be a Point mapped to a street name. Ie, the SF address database I downloaded says POINT(-122.408292 37.747785) 660 PRECITA AVE. Cool! There’s a lot of other data and ambiguity about exactly what that POINT encodes (street front? centroid of parcel? building?). But that’s a good place to start.
- Parsed address: addr:housenumber, addr:street
- Unparsed address: addr:full
- Larger scale data: addr:city, addr:state, addr:country, addr:postcode
There’s additional complications for buildings with multiple addresses, and regions that don’t use city/state addressing (ie, addr:subdistrict), and interpolation, etc. Gonna punt on that and focus to what’s relevant for major US cities.
Another significant question is whether you tag the address data onto a building, or do you create a separate standalone address node. The latter approach seems necessary in the US where we don’t have building footprints for so many places. Although MapBox’s analysis is that the majority of addresses are on buildings.
Here’s a list of top address keys culled from TagInfo. I’ll note the majority of these are tags on nodes, not ways.
- 26M: addr:housenumber
- 23M: addr:street
- 18M: addr:city
- 14M: addr:postcode
- 13M: addr:country
There’s a second group of keys that have hundreds of thousands of entries, but many fewer than those top 5 above
- 1967k: addr:conscriptionnumber. (German addresses?)
- 1605k: addr:interpolation
- 842k: addr:state
- 608k: addr:place
- 500k-600k: addr:street:name, addr:street:type, addr:street:prefix
- 463k: addr:streetnumber
- 3896: addr:full
- 280k-370k: addr:region, addr:district, addr:suburb
Here’s a bunch of maps of address data that people have made that are relevant. Some of these may not be for public consumption.
- Ito Map of OSM buildings with address tags
- OSM Inspector showing OSM address errors (Europe only?)
- Simon Poole’s map of OSM addresses
- Eric Fischer’s dot map of OSM addresses
- Mike Migurski’s map of TIGER addresses
I’d like to double back and see what they do in Denmark. That country has very good geocoding data and an ongoing process to sync up with the official government data.