Usenet encoding technologies

Back in 1993 I ran a Usenet server. Usenet’s mostly dead now but there’s a lively trade on  Usenet still in pirated media, particularly pornography. The challenge is posting a 1 gigabyte .AVI file in a medium that’s limited to small text files. Here’s some of the tech in use these days:

  • nzb, a metadata file for downloading a large file spread across multiple posts. It serves a function analagous to .torrent files in BitTorrent; doesn’t contain any content, but describes content. There are Usenet search engines that return NZB files, you can then take one to download the actual release off your favourite NNTP server.
    An NZB file is an XML document that describes a set of files which, when combined, make the release. Each file is described by a Usenet group and a list of segments. Each segment is a single Usenet message, named by Message ID. (Note: you can’t create the NZB file until after all the message parts have been uploaded).
  • rar, a compression format. It’s like zip, but tighter. Rar also does multipart well.
  • par2, an error correcting code format. Sort of the opposite of compression; transmit redundant data so a large archive can be rebuilt even if a few little pieces are missing.
  • uuencode, the historical encoding of binary data for Usenet. It’s a base 64 encoding with some very primitive file wrapper stuff. Still in use.
  • yEnc, the modern encoding of binary data for Usenet. It’s a full 8 bit encoding with escaping for a few characters (NULL, LF, CR, =). It also has a simple file wrapper and the ability to split a file amongst multiple messages. yEnc seems like MIME-lite, not sure what problem it solves that MIME can’t do.
  • sfv files are checksums on other files, used to verify your parts.
  • nfo files are text files (well, ansi graphics), typically a README from the release group.

I took a close look at a single NZB for a 300MB AVI file. Here’s what it contains:

  • 15 files: an SFV, a PAR2, 6 .RAR parts, and 7 PAR2 pieces.
  • Each RAR part is 50 megs and consists of 201 segments of length 249600.
  • The encoding makes each RAR file 70 megs on the server. Last chunk is smaller, total RAR text size is 410MB.
  • The PAR2 chunks total another 40MB.
  • It’s roughly 450MB to post a 300MB AVI to Usenet, or 50% overhead.

Here’s some Windows software I found useful:

  • GrabIt, a free NNTP client that’s good at downloading large binaries. The client is free and will download any NZB you give it, but they really want you to pay $25/year to use their NZB search engine.
  • QuickPar takes a PAR fileset and assembles the single file out of its components.
  • YEnc Power Post seems to be what people posting to Usenet use, but I can’t find a canonical linkf or it I trust.

What I’m missing is a reliable, free NZB search engine. There’s a variety of for-pay options of varying levels of sleaziness. binsearch.info is promising.