e00 GIS format notes

Trying to work with the data files for this awesome Volcano map from USGS. They helpfully published their GIS data for free download; unhelpfully it’s in ESRI’s proprietary E00 format. Some notes.

An E00 file is a container for geography data with annotations like centroids, labels, etc. The volcano files also contain auxiliary .avl and .lyr files that seem to be graphical legend definitions, etc.

The format is mostly straightforward ASCII and somewhat documented. It’s a container format; inside it are geometries and other properties. The part I’m really interested in is the PAT section, which seems to have labels and auxilliary data for the geometries defined in the ARC section. I could probably fake out a join of my own from the ARCs to the PAT data without too much coding. To complicate things I think the geometries are line boundaries between polygons, like TopoJSON, and not the polygons themselves..

Couldn’t find any Python E00 code, but there is “version 0.05” of a Perl module.

QGIS can load an E00 file, but all it seems to do is grab the geometry as LineStrings. It loses all the metadata. It’s also super slow. Way faster loading the ARC shapefile that ogr2ogr generated. I forget, do shapefiles contain spatial indices by default?

Seth F wrote to tell me about avce00, a program that converts binary coverages to ASCII E00 and back. What’s a coverage? That’s ESRI’s name for “a bunch of spatial data with auxiliary data”, ie the bundle in the E00 file. The main thing it seems to do is covert between binary and ASCII E00; I don’t need it for the volcano data.

GDAL ogr2ogr has some E00 support. A simple ogr2ogr is doing a game job of creating multiple shapefiles with my data. It is terribly slow, 3+ hours on my 30 meg input file. But it does seem to convert the data. There’s a shapefile named ARC that contains the boundary LineStrings and a shapefile named PAL that doesn’t have any geometry in it, but contains the auxiliary data I care about from the PAT section.

Anyway, the ogr2ogr converted shapefile seems reasonable to work with. The properties for the main geometry table named ARC includes integer fields named FNODE_, TNODE_, LPOLY_, RPOLY_, BIMGEO_FN, BIMGEO___1 (copied? to UserId), and LINE-TYPE. There’s 21,563 rows in the properties table, and roughly 6745 unique LPOLY_ ids and an unknown but < 16578 unique BIMPGEO__1.

Meanwhile, the PAL table has 6744 rows with the fields I really care about: YEAR1 and YEAR2 as well as LABEL. It also has BIMPGEO_FN, which looks to be a unique ascending integer 1..6746 and a BIMPGEO__1 field, 0..6980. I’m betting one of those is a key I can join to LPOLY_ on the ARC table to associate the years and label to the geometry.