OpenAddresses run: causes of failures

206 of the 608 sources I tried with the new Python code didn’t work. I went through and hand-classified them by type of failure. (Related: we need better error reporting in the code, I could only do this by reading stderr logfiles and guessing.)

I’d guess about half the failures are problems with the source not being online at the time. And maybe only a quarter of the problems are bugs in the Python parsing code. But not sure without looking more closely.

Update: I triaged and either solved or filed issues for the following failures:

  • Parser doesn’t handle malformed CSV
  • Parser doesn’t handle GML files
  • Excerpt failed (partial fix; the conform continues)
  • Run worked but no output CSV
=== Problems with the parser ===

Parser doesn't handle malformed CSV
https://github.com/openaddresses/machine/issues/36
  48 files: kr-seoul-*

Parser doesn't handle GML files
https://github.com/openaddresses/machine/issues/38
  15 files: pl-*

Excerpt failed
https://github.com/openaddresses/machine/issues/35
  is
  jp-akita, jp-chiba, jp-fukushima, jp-gunma, jp-ibaraki, jp-iwate, jp-saitama, jp-tochigi, jp-yamagata
  au-queensland
  ca-ab-calgary
  us-mn-dakota

Shapefile parsing problem
  ca-bc-okanagan_similkameen
  10 files: us-nc-? us-nc-10
  us-nc-lincoln
  us-ne-omaha
  us-va-stafford

CSV parsing problem
  be-flanders
  ca-pe
  us-co-mesa
  us-va

JSON parsing problem
  us-al-shelby
  us-ga-muscogee
  us-id-canyon
  us-in-hamilton
  us-mn-pope
  us-mn-wadena
  us-ms-hinds
  us-ms-madison
  us-nm-san_juan
  us-oh-hamilton
  us-pa-philadelphia
  us-tx-denton us-tx-el_paso us-tx-keller us-tx-north_richland_hills
  us-va-alexandria
  us-va-city_of_emporia
  us-va-city_of_petersburg
  us-wi-adams us-wi-crawford us-wi-dodge


=== Mysteries ===

Run worked but no output CSV
  ca-bc-surrey
  us-ak-matanuska_susitna_borough
  us-al-calhoun
  us-mi-kent
  us-mi-ottawa
  us-mn-otter_tail
  us-nv-las_vegas
  us-pa-beaver
  us-sc-berkeley
  us-sc-lexington
  us-sd
  us-va-accomack
  us-va-city_of_norton
  us-va-essex
  us-va-fluvanna
  us-wi-fond_du_lac
  us-wi-vernon
  us-wy-laramie


=== Malformed sources ===

Multiple shapefiles without a file attribute
  ca-bc-kamloops
  ca-bc-kelowna
  us-nc-davie
  us-oh-clinton
  us-wi-calumet


=== Problems getting source data ===

Download failed
  dk
  ca-ab-strathcona-county
  ca-bc-nanaimo
  ca-bc-vernon
  ca-ns-halifax
  ca-sk-regina
  us-ar
  us-ca-alameda_county
  us-ca-amador
  us-ca-san_francisco
  us-co-gunnison
  us-in-st_joseph
  us-mn-metrogis
  us-nc
  us-nc-wake_county
  us-ny-nyc
  us-va-augusta
  us-va-richmond_city
  us-wa-snohmish
  us-wi-superior
  za-nl-ethekwini

ESRI download failed
  us-ct-avon us-ct-haddam us-ct-lyme us-ct-watertown
  us-fl-alachua
  us-ia-linn
  us-ia-polk
  us-in-madison
  us-in-marion_county
  us-ky-oldham
  us-la-acadia
  us-mi-muskegon
  us-mn-polk
  us-mn-yellow_medicine
  us-mo-barry us-mo-columbia us-mo-st_louis_county
  us-mt-park
  us-nv-henderson us-nv-lander us-nv-nye us-nv-washoe_county
  us-tn-memphis
  us-va-roanoke
  us-va-salem
  us-wa-san_juan
  us-wi-iron us-wi-jefferson us-wi-juneau us-wi-lincoln us-wi-oneida us-wi-richland us-wi-sauk

Bad zip file
  nz
  ca-bc-langley
  ca-bc-west_kelowna
  us-co-sanmiguel
  us-fl-collier
  us-ga-glynn
  us-la-st_james
  us-ma
  us-nc-charlotte us-nc-columbus
  us-ri
  us-tx-round_rock
  us-va-city_of_falls_church