Mike Bostock took a crack at using TopoJSON to encode the NHDFlowline dataset. Just the geometry for rivers in 2 dimensions; no properties, etc. Tested just for California. All sizes are one million byte megabytes.
- Source shapefile: 132M, 72M gzipped.
- Naive GeoJSON conversion: 184M, 56M gzipped.
ogr2ogr -dim 2 -f GeoJSON -select ” new.geojson NHDFlowline.shp
- GeoJSON rounded to 5 digits: 95M, 21M gzipped.
liljson -p 5 < new.geojson
- GeoJSON rounded to 3 digits: 80M, 9M gzipped.
liljson -p 3 < new.geojson
- TopoJSON: 20M, 3.3M gzipped.
So for this data TopoJSON is about 1/4 – 1/5 the size of the equivalent GeoJSON. And those advantages persist through gzip. And that’s for a pathologically bad case, where there’s no shared topology along polygon boundaries. Pretty much all the savings here must be coming from the delta encoding. Neat!
Update: Mike converted the whole US. 2.5G of .shp file input, 327M of topojson output.
Note that TopoJSON quantizes. Mike used the default TopoJSON settings which I think work out to about 10,000 x 10,000 resolution, which makes the comparison to GeoJSON rounding to 3 digits about fair. Here’s a snapshot of a render of the TopoJSON that Mike gave me. It looks right.