Digital National Geographic schema notes

I bought a digital copy of the National Geographic Magazine archives. Great data collection, horrible Adobe Air interface to access it. Fortunately the archive is basically a bunch of lightly obfuscated image files and SQLite databases. I might take a crack at making a more usable index to all the articles. To that end, some notes on the database schema I found.

There’s individual sqlite files on the DVDs and downloads, but I think they are all merged in a file called Contents/Resources/assets/db/cngcontent.sqlite3.full. I find 7977 article entries in 1416 issues for my set up to 2012, which sounds roughly the right scale. Here’s the tables in it and some notes on schema. Numbers in parentheses are how many rows are in the table.

Tables all have simple primary keys named “id”. “search_time” is a popular field, it stores the date of the issue as an integer. IE “1977 Sep” has a search_time of 19770901.

articles (7977)

“id” INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
“start_page_offset” integer,
“page_count” integer,
display_name” varchar(255),
summary” varchar(255),
search_time” integer,
issue_id” integer

issues (1416)

“id” INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL
“display_name” varchar(255)
search_time” integer
“page_count” integer
“numbered_page_offset” integer
“numbered_page_count” integer
“numbered_page_start_value” integer
“page_exceptions” varchar(255));

geolinks (27570)

“id” INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL
location_id” integer
article_id” integer
“department_id” integer
“map_id” integer

locations (5016)

“id” INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL
display_name” varchar(255)
lat” float
lng” float
“bounds_ne_lat” float
“bounds_ne_lng” float
“bounds_sw_lat” float
“bounds_sw_lng” float
“geo_provider” varchar(255)
“kms_per_lng_degree” float

contributors (12400)

id” INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL
display_name” varchar(255)
“search_time” integer
article_id” integer
“department_id” integer
“map_id” integer
“start_page_offset” integer

other tables

article_subjects (25774),
ad_subjects, ads,
departments, department_subjects,
maps, map_subjects,
photos (251002), photo_subjects,
links, schema_migrations, trivia_questions, trivia_rankings

 

Looks totally feasible to join through geolinks to find all articles in a specific region.