Nelson's log

Javascript data compression

I’m storing 500k blobs in localStorage. I’d like to make them smaller. In any sane platform there’s some compression function in the library: gzip(data) and it’s 30% the original size. But no, not in Javascript. And there’s no clear third party library that everyone agrees is great.

I found a good roundup of compression options from October 2011, it’s a series of 4 blog posts. It has a lot of detail on speed and compression of 11 different libraries. Hilariously, about half of them simply don’t work or are only capable of doing half the compression / decompression. This StackExchange question is good because it’s specific to localStorage.

Part of the problem is just like Javascript doesn’t have an integer type, it also doesn’t have a byte array type. The basic String (DOMString?) is close, but really it’s a sequence of Unicode characters. So to be efficient you really want to use all 16 bits for each character in the String, although that StackExchange question recommends only using 15 bits to avoid issues with UTF-16. And suddenly we’re at awkward storage. And even storing only 8 bits per word in the String, a lot of these compression libraries seem to have problems storing various “special characters”; apparently getting encoding and decoding right is not easy. You see a lot of base64 encoded data in Javascript strings out there, yuck.

Javascript typed arrays look to be the standard solution for binary data; you create a byte array and then get views into it treating each word as an Int8 or Int32 or whatever. I’ve never worked with them and am unsure about compatibility in current implementations. Also I haven’t found an existing Javascript library that compresses using them.

Related: see this discussion of byte array objects. He discards Strings early because they’re immutable and so modifying a single byte requires a full String copy. He compares ordinary Javascript Arrays, typed arrays ArrayBuffer, and (clever), ImageData. He finds the APIs for the fancy array types are unreasonably slow. No evaluation of size efficiency or ease of use, things I care about.

I think my solution is to use some very simple LZW compression or the like, combined with an encoder that uses 15 bits per word in a String. It’s be nice to wrap this up in a simple API, something that takes an Object, stringifies it with JSON, compresses it, and returns a safe String that can later be unpickled back into the Object. Not sure I need it enough to bother.