Calistore 0.8 quickstart (Ubuntu) and thoughts

I’ve been curious about Camlistore for awhile, a project from a lot of smart people like Brad Fitzpatrick and Aaron Boodman and the like. It’s a content addressable filestore “for life”, a sort of Dropbox / Lifestreams kind of thing for storing all your personal data. Ambitious and necessary. My immediate use case is as a replacement for the horrible Unison and rsync tricks I do with a USB drive to keep two Macs sort of in sync.

They’re up at 0.8 now and have fewer dire warnings about how it’s not ready for users yet. I just asked Brad F and he said “I’d wait a bit for 0.9.”, which I good-naturedly take to mean “sure 0.8 is worth looking at” :-) Quick start on getting it going in Ubuntu, based on the Camlistore docs

  1. Install the Go Language. Don’t bother looking for an APT repository, the official one is out of date and the third party PPAs are all abandoned. Just install straight from the golang site.
  2. Download and build Camlistore from the github repo
  3. Run the camlistore daemon. It should advise there’s a UI on http://localhost:3179/ui/
  4. I had a problem with the UI requiring a basic auth HTTP login; maybe the “allow localhost” stuff was fooled by my ssh port forwarding. So I edited ~/.config/camlistore/server-config.json to add a username and password

I also watched Brad’s video from Feb 2014 to FOSDEM. Nice overview and some demos.

The main takeaway is the way Camlistore works is a blob server + a metadata indexing system. The blob server is a simple content addressable file system, keeps track of syncing raw data whose only name is the SHA1 hash of the data. Then there’s other metadata blobs which contain things like “blob with SHA1 f392… is actually IMG_0902.jpg, a 3000×2000 JPEG with this creation data”. The problem with content addressable file systems is that the moment you change the file it’s basically a brand new file. They have a concept of “permanode” that provides permanent identity for things independent of their content. It sounds like a lot of work is put into the indexer system to make it a usable filesystem.

Speaking of usable, the main UI they show off is a Web UI. Fancy real time updating searches. Demo was managing a bag of phonecam images. There’s also a FUSE filesystem that in the demo seemed very slow because it was syncing to a remote server. (Slow, like “touch foo” took 5+ seconds.) Not sure if there’s a faster way to sync to a local server, then have it sync those updates to the remote server. Seems like you’d have a consistency problem for the metadata on permanodes. Maybe they rely on timestamps to sort out a merge?

There was also a quick demo of “thirdleg”, a sync from one store to another through a third thing, like a portable drive you’d carry from A to B. That’s exactly what I need for my use-case above, but I’m not sure if I’d want to rely on FUSE to be my filesystem for actually using my synced files.

My main takeaway is that they have a pretty good take at a low level distributed data store suitable as a sort of synced remote filesystem. I was a little less convinced about the applications demoed, like that Camlistore will end up being most useful as an underlying layer for something and we haven’t quite seen the apps that use it yet. For near term usefulness the key thing is probably how responsive that FUSE file system is. Need to try it out.