Duplicati remote backup notes

Some notes on Duplicati for remote Linux backups. Primary opinion: seems reasonable, needs better docs.

I need a new remote backup solution for my home Linux box now that CrashPlan is truly closing up. In the past I hacked up a a remote rsnapshot option but I wanted something more user friendly. From the Hacker News discussion it seems Duplicati is the consensus choice. The other option I should explore is rsync.net.

I was backing up about 50GB of stuff on CrashPlan. At Amazon pricing that’d be about $1.30 / month. rsync.net would be $4.00/month. I can probably do this for free now on a server I have lying around in a datacenter. The fact the blocks are encrypted makes this all much more reassuring.

Installing

Duplicati runs as a systemd service. It has a web GUI listening on port 8200 and some sort of schedule thing where it runs every few hours. It stores backups in some Duplicati-specific database in encrypted 50MB chunks. The nice thing about Duplicati is it can store those data chunks on a variety of offsite backends, including any FTP, SSH, WebDAV, or S3-like service. Also support for specific services like AWS, Dropbox, etc.

Installing was kind of awkward: I followed the instructions for this headless install since the Debian/Ubuntu package they provide apparently requires an X environment. Even so I still had to install Mono, which is an awful lot of packages. Looks like the thing is written in C#.

Configuring seemed simple. I’m starting with just backing up my source code to local disk. I have a convention of putting some files in “nobackup” directories if they are huge downloads I don’t want to back up. I added a filter for that, “-*/nobackup/”. There’s also a default set of Linux filters which seems to be about not backing up any system files. Including stuff like /etc which honestly, you probably want backed up. But it seems reasonable for backing up home directories. Half-tempted to not back up my virtual environments for Python; I’ve got a bunch of 500MB monstrosities. But then it’s a PITA to rebuild them and it’s safer to just back up everything.

I made one config mistake which was to enable throttling of bandwidth. This applies even to local disk backups. I do want to throttle for network backups eventually.

Running

Anyway, set it all up and started it running. Seems to be doing something, judging by the 100% CPU usage of the mono-sgen process running Duplicati. The docs mention everything being compressed so I guess that’s where the CPU is going.

I tested this with about 19 gigabytes of files, 9 gig to be excluded by the nobackup filter. First run took 30 minutes. Duplicati said it was 9 gig to backup and 5 gig stored, which seems about right.

Second run with basically no changes took 1 minute. Backup directory expanded by about 5 MB.

A restore of a 770MB directory took less than a minute. It restored everything right, including timestamps and file permissions.

Remote backup

The local disk test went so well I went ahead and set up an ssh remote backup to a Linux server I own. I created a new user on that system, the configured Duplicati to back up to that host with a saved username / password. (There’s an option for ssh keys too). That’s about all I had to do, it’s just backing up as I speak. I did set up a network throttle at 400 KBytes/second. That seems to be consuming 3.46Mbits/ssecond, so there’s 260kbps in overhead. Probably TCP. CPU usage on the backup process is mostly about 3% when running throttled like this, with brief bursts of 100% activity. A second backup and a restore both worked fine.

Opinions

I like the product! It works well and simply. It could probably replace what I use rsnapshot for as well as my remote backups.

The documentation for the project is pretty poor, with stuff spread out over a few articles, wiki pages, and forum postings (!). Par for the course for free software. Also kind of a slow development process, it’s been 2 years+ for the 2.0 and it’s only sort of in beta now. OTOH it all seems to work, and is free, so I shouldn’t be complaining.

I’m a little nervous about my backups being in some unknown database format. OTOH the code is open source, absolute worst case presumably some nerd could figure out how to solve any problem.