I hate 2018 favicons

I just added a favicon to a new web app I’m building. Check out this 1500 bytes of boilerplate I just added to every page:

<link rel="apple-touch-icon-precomposed" sizes="57x57" href="/images/favicon/apple-touch-icon-57x57.png" />
<link rel="apple-touch-icon-precomposed" sizes="114x114" href="/images/favicon/apple-touch-icon-114x114.png" />
<link rel="apple-touch-icon-precomposed" sizes="72x72" href="/images/favicon/apple-touch-icon-72x72.png" />
<link rel="apple-touch-icon-precomposed" sizes="144x144" href="/images/favicon/apple-touch-icon-144x144.png" />
<link rel="apple-touch-icon-precomposed" sizes="60x60" href="/images/favicon/apple-touch-icon-60x60.png" />
<link rel="apple-touch-icon-precomposed" sizes="120x120" href="/images/favicon/apple-touch-icon-120x120.png" />
<link rel="apple-touch-icon-precomposed" sizes="76x76" href="/images/favicon/apple-touch-icon-76x76.png" />
<link rel="apple-touch-icon-precomposed" sizes="152x152" href="/images/favicon/apple-touch-icon-152x152.png" />
<link rel="icon" type="image/png" href="/images/favicon/favicon-196x196.png" sizes="196x196" />
<link rel="icon" type="image/png" href="/images/favicon/favicon-96x96.png" sizes="96x96" />
<link rel="icon" type="image/png" href="/images/favicon/favicon-32x32.png" sizes="32x32" />
<link rel="icon" type="image/png" href="/images/favicon/favicon-16x16.png" sizes="16x16" />
<link rel="icon" type="image/png" href="/images/favicon/favicon-128.png" sizes="128x128" />
<meta name="msapplication-TileColor" content="#FFFFFF" />
<meta name="msapplication-TileImage" content="/images/favicon/mstile-144x144.png" />
<meta name="msapplication-square70x70logo" content="/images/favicon/mstile-70x70.png" />
<meta name="msapplication-square150x150logo" content="/images/favicon/mstile-150x150.png" />
<meta name="msapplication-wide310x150logo" content="/images/favicon/mstile-310x150.png" />
<meta name="msapplication-square310x310logo" content="/images/favicon/mstile-310x310.png" />

How awesome is that! And I have no idea if it’s correct and no practical way to test it. I’m trusting Favic-o-Matic here. I tried reading docs about what to do online but every single website says something different. And who knows; maybe Apple will innovate with a new 79×79 size next week. (To be fair, Favic-o-Matic does offer the option to have fewer sizes; 16 / 32 / 144 / 152 is the minimal set.)

The original favicon standard wasn’t so bad. Nothing in the HTML at all, and a single /favicon.ico file in your root directory. That format was weird and semi-proprietary but it had the advantage it could hold multiple resolutions in a single file. Simple and done.

Then Apple screwed it up by starting to fetch random weird URLs on the website for its precious iOS icons. Then webmasters complained and so this linking standard started. Apple went overboard in supporting every single possible pixel-perfect resolution. Then Microsoft decided that was neat and added their own new incompatible formats for the stupid Start menu tiles no one uses anyway. And here we are.

Really want I want is to publish a single reasonable image, maybe 256×256, and just let the desktop clients auto-scale them. Yeah it won’t be pixel perfect but it’s not like I’m redrawing these icons at every size anyway. Either that or modernize the old favicon.ico idea so a single file has all the icons. A zip container would do nicely.

Porn mode vs IndexedDB

I’m fond of testing my webapps in porn mode (aka incognito mode, private browsing, etc.) It’s a very convenient way to test a webapp starting from a blank slate.

Only, IndexedDB doesn’t work in private mode in any browser but Chrome. This breaks Dexie too. In Firefox you get an error

InvalidStateError A mutation operation was attempted on a database that did not allow mutations.

That’s too bad. It does work in Chrome; it acts like they store the database but then wipe it when the private session ends.

TensorFlow MNIST sigmoid recognizer

My baby is starting to see! I built my first custom-designed neural network in TensorFlow and I’m happy. You can see my Python notebook here.

The fun thing about this is programming neural networks as a form of experimental science. There’s so many parameters to tweak, and the TensorFlow abstractions are so high level and complex. I’m not really sure my code is right. But I can just run an experiment, measure the accuracy, and if the result is good then maybe I did something right.


After doing my TensorFlow tutorials I decided to double back and re-implement my work from Ng’s Coursera course, ex4, which had us implementing backpropagation by hand and then creating a neural network that can recognize handwritten digits from MNIST. I liked this exercise back in Ng’s course because it felt like a real task and had a hidden surprise, the visualization of the feature layer. So time to try again!

The Deep MNIST for Experts tutorial from TensorFlow does this task for you, but with a pretty complex neural network. I decided to clone Ng’s network as closely as possible. To wit: a single hidden layer of 25 nodes using a sigmoid() activation function, yielding 95.3% accuracy.

Turns out that’s not entirely easy to replicate the initial experiment. Ng’s input data was 20×20 images and TensorFlow has 28×28 inputs. Instead of training 400 steps on the whole dataset I’m training 20,000 steps on tiny subsets of the data. I’m not regularizing my input like we were taught, I’m using dropout instead as a way to avoid overfitting. And I’m also not positive I’m using the same exact cost and training functions. So lots of differences. But at least it’s the same class of network.


The resulting trained accuracy is about 96% ±0.4%. It takes about a minute to run.

Now that I understand this there are so many things to try.

  • Hidden nodes: more improves accuracy; 50 hidden nodes is about 96% and 100 hidden nodes is about 97%.
  • Activation function: why stick with sigmoid, I can plug in anything! I already tinkered with this inadvertently; I’m not sure if the bias parameter belongs in the sigmoid() or outside, either seems to work
  • Training optimizer. AdamOptimizer seems to converge faster but to a lower accuracy of 94.6%. For that matter I haven’t tried tuning the learning rate parameter.
  • Dropout probability. The sample code I cribbed from had this at 0.5; you really can train a network with randomly knocking out half its nodes? Wow. A setting that high seems to be hurt accuracy; I’m getting my best results around 0.1. Or even 0.0; maybe this stuff isn’t needed.

Retina dissection

There was a neat trick in Ng’s class where we visualized the hidden layer of our neural network to get some insight into how the classifier was doing its thing. Here’s an image from that exercise. Inline below is the same kind of image from my new network.


It’s qualitatively different, I think. So many of the features look like hands on a clock; identifying line segments in numbers maybe? I don’t know what to think of this. My old image looks way more random, I wonder if it was overfit in a way this new one isn’t.

One thing I learned doing this; if I allow 100 hidden nodes instead of just 25, a lot of the hidden nodes look qualitatively the same in the visualization. If they’re mostly identical does that mean they are redundant? Unnecessary?

I also took a crack at visualizing the hidden nodes that contributed them most to identifying each image. Here’s the top 5 nodes for the numbers 0 and 1


Again, not really sure what to make of this. Particularly since the most important node for both numbers is the same! I think I’m sorting by overall positive contribution, not absolute value. I’m not considering bias terms though.

Anyway, I feel like I know how to set up a basic neural network in TensorFlow now. Lots of stumbling around and cargo cult programming. But the ability to evaluate training accuracy is a strong external check on whether your code is working OK. What it doesn’t tell you is if it’s working great.

TensorFlow optimized builds

tl;dr: install these TensorFlow binaries for a 2-3x speedup.

Update: or not; turns out the AVX binaries probably only are 10% faster. See below.

I’m now running TensorFlow programs slow enough that I care about optimization. There’s several options here for optimized binaries:

  1. Stock TensorFlow
  2. TensorFlow recompiled to use Intel CPU parallel instructions like SSE and AVX. See also the warning stock TensorFlow gives:
    tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
  3. TensorFlow with the GPU

I’m trying to get from 1 to 2; from what I’ve read it’s a 2-3x speedup. GPU is even better of course but is a lot more complicated to set up. And the Linux box I do my work on doesn’t even have a GPU (although my Windows desktop does).

I’m testing this all with a simple hidden sigmoid layer neural network and Adam’s Optimizer, training to recognize MNIST data.

I tried building TensorFlow from source and quit pretty quickly. It requires bazel to build, which in turn requires a Java runtime, and I noped out. Probably could get it working with a couple of hours’ time.

I tried Intel’s optimized TensorFlow binaries. These seem not to be build with AVX; I still get the warning. They are also slower, my little program took 210s to run instead of 120s. Reading their blog post it sounds like this is mostly Intel’s crack optimization team reordering code so it runs more efficiently on their CPUs. (Intel has an amazing group of people who do this.) Also the patches were submitted back to Google and are probably in stock TensorFlow. Not sure why it’s slower, and I’m bummed they didn’t build with AVX, but here we are.

lakshayg’s binaries. No idea who this guy is but sure, I’ll try a random binary from anyone! Bingo! My program goes from 120s to 46s, or a 2.6x speedup. Hooray! (But see below). One slight caveat; this is 1.4.0rc1, not the latest 1.4.1. There’s about two weeks worth of bug fixes missing.

TinyMind’s Tensorflow wheels are another source of precompiled Linux versions of Tensorflow. They’re built with AVX2 which unfortunately my processor doesn’t support.

Starting with 1.6 Google is going to release AVX binaries only. This breaks older CPUs, shame they can’t release several different binaries.

Update: I’ve noticed the performance isn’t stable.  With the AVX binaries my program runs sometimes in 46 seconds (yay!) and sometimes in 110 seconds (boo!). With Google’s stock build it’s sometimes 51 and sometimes 120. That suggests the AVX binaries aren’t a significant speedup for my program and I have a deeper mystery.

I spent several hours figuring this out. Turns out in the slow case, my program spends most of its time in mnist.next_batch(), I think when it runs out of data and has to reshuffle. I have no idea why it’s so variable or slow but it’s not an interesting failure given this is tutorial code. Does remind me I should learn more about how to manage test data correctly in TensorFlow.

If I stub out the batching so it’s not a factor my program runs in about 29s with the AVX binaries, 32s with stock binaries (no AVX). So maybe a 10% improvement. That’s not very exciting.

TensorFlow day 2

Some more tinkering with TensorFlow, in particular the MNIST for ML Beginners and Deep MNIST for Experts tutorials. MNIST is neat; it’s a standard normalized dataset of handwriting samples for the numbers 0-9. A classic for machine vision testing, with well known results and training accuracies of 88 – 99.5% depending on the approach. Consensus test data like this is so valuable in a research community. I worked with this dataset back in Ng’s Machine Learning class.

First up, MNIST for ML Beginners. It has you build a basic linear regression model to classify the numbers, then train it. Final accuracy is about 92%.

I followed this just fine, it’s pretty straightforward and not too different from the “getting started” tutorial. Just on real data (MNIST) and using some slightly more sophisticated functions like softmax and cross_entropy. Some notes:

  • TensorFlow has datasets built in, in the tensorflow.examples package.
  • The MNIST data set has a “.train” collection of training data and a (presumably disjoint) “.test” collection for final test data. The .train set also has a method .next_batch() which lets you randomly subsample rather than training on all data every single iteration.
  • The concept of “hot ones” representation. For labeling the digits 0-9 we have an array of 10 numbers (one per digit). Every number is 0 except for a single 1, which marks the label. There’s also the “tf.argmax()” function for quickly finding index of the column set to 1.
  • The softmax function which takes a vector of weights and normalizes it so it becomes a vector of probabilities that sum to 1. The weighting is exponential.
  • TensorFlow has an InteractiveSession which lets you mix declaring stuff with running session code conveniently. Good for noodling in a notebook.
  • “Loss functions”, basically a measure of the error between a prediction your model makes and the expected result data. These tutorials use the cross_entropy function, an information theory calculation that involves the probabilities of each outcome as well as just measuring the error.
  • tf.train.GradientDescentOptimizer() is a simple optimizer we apply here in a straightforward way. Note this is where TensorFlow’s automated differentiation comes into play, to do the gradient descent.

The second tutorial I did was Deep MNIST for Experts. This has you building a 4 layer neural network (aka “deep”) that maps 5×5 patches of the image to 32, then 64 features, then convolves it all to a single flat 1024 features before classifying it. Final accuracy is about 99.2%.

I had a harder time following this, it assumes a lot more machine learning knowledge than the previous tutorials. If you don’t know things like what a rectified linear neural network, what dropout does, or what the Adam Optimizer is you’re gonna be a bit lost. It me; I’m kind of blindly copying stuff in as I go.

  • The full source has this weird thing about name_scope in the code. I think this is an extra level of testing / type checking but I’m not positive. I left it out and my code seems to have worked.
  • This code gets a bit complicated because you’re working with rank 4 tensors, ie: one giant 4 dimensional array. The first dimension is test image #, the second and third are pixels (in a 28×28 square) and the fourth is a single column for color value. It’s a standard setup for 2d image processing, I imagine.
  • The network structure is neat. Intuitively you boil down 28×28 grey pixel values into 14×14 32 dimensional values. Then you boil that down again to 7×7 64 dimensional values, and finally to a single 1024 feature array. I’m fascinated to know more about these intermediate representations. What are those 1024 features? I expect one is “looks like a vertical line” and one is “looks like a circle at the top” and the like, but who knows. (I bet someone does.)
  • The pooling from 28×28 → 14×14 → 7×7 is odd to me. It uses max_pool, which I think means it just takes the maximum value from a 2×2 window. Surprised that blunt an instrument doesn’t throw things off. For that matter what does a derivative of this function mean?
  • Dropout sounds crazy; you randomly just drop nodes from the neural network during the training. This keeps the network honest, avoids overfitting. It feels a bit like randomly harassing someone while they’re studying to keep them on their toes. The paper they linked says Dropout is an alternative to regularization. I note this code doesn’t ever regularize its input, so I guess it works?
  • They also introduce the idea of initial weights in a neural network. I remember this from Ng’s course; you want them to not all be 0, because then nothing can break the symmetry. Also they give everything a positive bias term to avoid “dead neurons”. Not sure what that means.
  • The pluggable nature of Tensor modules is apparent here. Particularly the swap to the “Adam Optimizer” over a simple gradient descent. I have no idea what this algorithm does but using it is literally one line of code change. And presumably it’s better, or so the linked paper claims.
  • It’s slow! 20,000 training iterations on a i7-2600K is taking ~20 minutes. Now I wish I had the custom compiled AVX version, or a GPU hooked up :-) It is running as many threads as it should at least (7 or 8).
  • They have you running 20,000 training iterations but the accuracy measured against the training set converges to 0.99 by around 4000 iterations. I wonder how much the network is really changing at that point. There’s a lot of random jitter in the system with the dropouts and sampling, so there’s room. The accuracy against the test set keeps improving up to about 14,000 steps.

One thing these tutorials are missing is more visual feedback as you go along. That, and some easy way to actually use the model you’ve spent an hour building and training.

I’d like to go back and implement the actual neural network I built for MNIST for Ng’s class. IIRC it’s just 1 hidden layer. the 20×20 pixels are treated as a linear array of 400 numbers, then squashed via sigmoid functions to a hidden layer of 25 features, then squashed again to a hot ones layer of 10 numbers. It would be a good exercise to redo this in TensorFlow. The course notes describe the network in detail and suggest you expect about a 95.3% accuracy after training.

TensorFlow introduction

I spent a couple of hours kicking the tires on TensorFlow, mostly working through the Getting Started docs. Which are excellent, btw. Here’s some things I learned. These are all super basic, undergrad level things. The real stuff requires more time to get to.

  • Installing TensorFlow is as easy as “pip install tensorflow”. It runs fine in Jupyter with no problems.
  • Don’t be dumb like me and try to get the GPU accelerated version working at first; that’s hard because NVidia’s packaging is such a mess. For that matter ignore the warnings about CPU optimizations for your hardware. That might make it run 2-3x faster, but you have to compile TensorFlow yourself to do that.
  • Tensor” is a fancy word for “multi-dimensional array”. It’s numpy under the hood. TensorFlow is all about creating a flow of data through tensors.
  • “Tensor” is also a fancy word for “deferred computation”. The TensorFlow framework basically has you creating a bunch of Futures/Promises and linking them together in a model. You don’t run code imperatively, you declare function objects (of type Tensor) and then hand them to a session manager to run. This enables two important kinds of magic:
  • Magic 1: the session runner handles running your model. You don’t really care how. Maybe it’s run on the CPU, maybe the GPU, maybe it’s handed off to a distributed compute cluster. All you know is you told the session to run and you got a result.
  • Magic 2: when training a model, the derivatives are calculated via automatic differentiation. Most machine learning techniques require that you not only calculate the error between the model’s output and the desired output, but also the first partial derivatives of that partial error. You can then use the derivative for gradient descent optimization, etc. A big part of what makes machine learning algorithms mathematically difficult is analytically finding those derivative functions. You can numerically approximate the derivative but that doesn’t work very well. TensorFlow instead automatically generates derivatives by inspecting the model you created out of Tensor building blocks and functions. (See also Google Tangent, a different approach to automatic differentiation done by decompiling Python code. wacky!)
  • You can write your own training system by using the tf.train API to create a model and pass it to your optimizer of choice. Or you can get fancier and use the tf.estimator API to run a whole machine learning project for you. That’s most of what the “getting started” tutorial has you do, those two approaches.
  • The trained models become things you keep in TensorFlow; you can store them to disk, apply them to input data, etc.
  • There’s a nifty tool called TensorBoard that can visualize a TensorFlow model, all the functions it is built out of.  There’s also visualizations of the training process.

At the end of the tutorial all I’d done was train a very simple linear regression model to some toy one dimensional data. But I sort of understand how the parts fit together now. I’m impressed with how well crafted those parts are, Google has put a whole lot of effort into packaging and presenting TensorFlow so folks like us can use it. It’s impressive.

The next step in the tutorials is to train a simple handwriting recognizer. I did that from scratch in Ng’s course, will be fun to revisit it with a high level toolkit.


Duplicati remote backup notes

Some notes on Duplicati for remote Linux backups. Primary opinion: seems reasonable, needs better docs.

I need a new remote backup solution for my home Linux box now that CrashPlan is truly closing up. In the past I hacked up a a remote rsnapshot option but I wanted something more user friendly. From the Hacker News discussion it seems Duplicati is the consensus choice. The other option I should explore is rsync.net.

I was backing up about 50GB of stuff on CrashPlan. At Amazon pricing that’d be about $1.30 / month. rsync.net would be $4.00/month. I can probably do this for free now on a server I have lying around in a datacenter. The fact the blocks are encrypted makes this all much more reassuring.


Duplicati runs as a systemd service. It has a web GUI listening on port 8200 and some sort of schedule thing where it runs every few hours. It stores backups in some Duplicati-specific database in encrypted 50MB chunks. The nice thing about Duplicati is it can store those data chunks on a variety of offsite backends, including any FTP, SSH, WebDAV, or S3-like service. Also support for specific services like AWS, Dropbox, etc.

Installing was kind of awkward: I followed the instructions for this headless install since the Debian/Ubuntu package they provide apparently requires an X environment. Even so I still had to install Mono, which is an awful lot of packages. Looks like the thing is written in C#.

Configuring seemed simple. I’m starting with just backing up my source code to local disk. I have a convention of putting some files in “nobackup” directories if they are huge downloads I don’t want to back up. I added a filter for that, “-*/nobackup/”. There’s also a default set of Linux filters which seems to be about not backing up any system files. Including stuff like /etc which honestly, you probably want backed up. But it seems reasonable for backing up home directories. Half-tempted to not back up my virtual environments for Python; I’ve got a bunch of 500MB monstrosities. But then it’s a PITA to rebuild them and it’s safer to just back up everything.

I made one config mistake which was to enable throttling of bandwidth. This applies even to local disk backups. I do want to throttle for network backups eventually.


Anyway, set it all up and started it running. Seems to be doing something, judging by the 100% CPU usage of the mono-sgen process running Duplicati. The docs mention everything being compressed so I guess that’s where the CPU is going.

I tested this with about 19 gigabytes of files, 9 gig to be excluded by the nobackup filter. First run took 30 minutes. Duplicati said it was 9 gig to backup and 5 gig stored, which seems about right.

Second run with basically no changes took 1 minute. Backup directory expanded by about 5 MB.

A restore of a 770MB directory took less than a minute. It restored everything right, including timestamps and file permissions.

Remote backup

The local disk test went so well I went ahead and set up an ssh remote backup to a Linux server I own. I created a new user on that system, the configured Duplicati to back up to that host with a saved username / password. (There’s an option for ssh keys too). That’s about all I had to do, it’s just backing up as I speak. I did set up a network throttle at 400 KBytes/second. That seems to be consuming 3.46Mbits/ssecond, so there’s 260kbps in overhead. Probably TCP. CPU usage on the backup process is mostly about 3% when running throttled like this, with brief bursts of 100% activity. A second backup and a restore both worked fine.


I like the product! It works well and simply. It could probably replace what I use rsnapshot for as well as my remote backups.

The documentation for the project is pretty poor, with stuff spread out over a few articles, wiki pages, and forum postings (!). Par for the course for free software. Also kind of a slow development process, it’s been 2 years+ for the 2.0 and it’s only sort of in beta now. OTOH it all seems to work, and is free, so I shouldn’t be complaining.

I’m a little nervous about my backups being in some unknown database format. OTOH the code is open source, absolute worst case presumably some nerd could figure out how to solve any problem.