I like Unison for synchronizing files, but it is slow. I finally have a little insight into why; it’s calculating fingerprints for file contents. rsync seldom cares about file contents, it generally gets by with just metadata like file size and mtime. Unison’s approach seems to be fingerprint every file. At least it caches the fingerprints, it used to not!
The problem is the fingerprint algorithm is slow, slow, slow. Like it takes many seconds to fingerprint a 2 gigabyte file. I can’t find a paper that describes the algorithm and my OCaml is rusty, but the code makes mention of MD5 and the paper references “cryptographic checksum”. Using crypto hashes as generic hashes used to be a fad, I’m guilty of it myself, but unless you’re worried about cryptographic adversaries it’s always better to use a fast CRC or the like instead of a crypto hash. Crypto hashes are slow.
While I’m complaining I can’t help but think being written in OCaml is part of why Unison doesn’t get more attention and development now. Not many folks know ML, you know? It also doesn’t help with the speed. I can’t tell if there’s an ML version of whatever fingerprint hash it’s calculating or if it’s using a native optimized library. Also the whole thing is single threaded, which for an app doing slow CPU work on many independent files is a mess. OCaml apparently doesn’t support concurrency :-(