When my old server died I had some data in Postgres. I don’t think it was anything important but I can’t really be sure, I’d like to recover it. All I have are the binary files Postgres stores its data in. Turns out recovering from those sucks. Lesson learned: if you care about recovering your data, back it up into a SQL dump.
Postgres makes no effort to make its binary files compatible between versions. A 9.0 database can’t be used in a 9.1 install; when you upgrade, you have to run a funky recovery tool which dumps the 9.0 db and imports it to 9.1. A 9.0 database on a 32 bit machine can’t be imported into 9.0 built for a 64 bit machine, either. I’ve also seen problems in the past trying to work with data from a PostGIS database in a non-PostGIS install. I’m sure this format flexibility makes it easier to build a fast database engine, but it sure makes data recovery difficult.
I tried building 9.0 sources on my amd64 box and running it. That was easy enough, although setting up users and permissions is a bit finicky when installing by hand. But the DB was generated on an i386 box (32 bit) so it won’t load. At least it failed quickly, “FATAL: incorrect checksum in control file”. Then I started looking in to cross-compiling and realized the last thing I want is to set up a whole separate build toolchain. (A quick try with the linux32 command didn’t work.)
So screw it, what was I really storing in Postgres anyway? Not much, I did a quick test to see:
find . -type f -name \*py | xargs egrep -i ‘psycopg2|postgres’
Only one Python program I wrote even imports the Postgres driver. And I can recover that data from original sources. I think I have some geodata I imported with shp2pgsql, too, but I can redo all that. So I give up on recovering data.
Maybe I gave up too fast. Building a 32 bit version of Postgres on Ubuntu was as easy as
./configure CC=’gcc -m32′ –prefix=/tmp/pgsql90i386 –without-readline –without-libz
(there’s probably 32 bit versions of readline and libz, but I don’t really need them). Building a server this way, my amd64 machine can load the database binaries generated on the i386 machine. I had no idea the 64 bit Linux distributions could build 32 bit code so easily.
And it works! “psql -l” to find what databases were there, then “pg_dump database” to get the data out, then “psql” to get the data back into my 9.1 database. Yay!
Lesson learned, I’m now doing some simple nightly SQL backups of my databases. There are fancy ways to do this, I’m going very simple:
DEST=/mnt/sdb1/BACKUPS/postgres set -eu renice 10 $$ > /dev/null pg_dumpall --lock-wait-timeout=$[5*60*1000] | gzip -9 >| $DEST/dumpall-sql.gz