TensorFlow MNIST sigmoid recognizer

My baby is starting to see! I built my first custom-designed neural network in TensorFlow and I’m happy.

The fun thing about this is programming neural networks as a form of experimental science. There’s so many parameters to tweak, and the TensorFlow abstractions are so high level and complex. I’m not really sure my code is right. But I can just run an experiment, measure the accuracy, and if the result is good then maybe I did something right.


After doing my TensorFlow tutorials I decided to double back and re-implement my work from Ng’s Coursera course, ex4, which had us implementing backpropagation by hand and then creating a neural network that can recognize handwritten digits from MNIST. I liked this exercise back in Ng’s course because it felt like a real task and had a hidden surprise, the visualization of the feature layer. So time to try again!

The Deep MNIST for Experts tutorial from TensorFlow does this task for you, but with a pretty complex neural network. I decided to clone Ng’s network as closely as possible. To wit: a single hidden layer of 25 nodes using a sigmoid() activation function, yielding 95.3% accuracy.

Turns out that’s not entirely easy to replicate the initial experiment. Ng’s input data was 20×20 images and TensorFlow has 28×28 inputs. Instead of training 400 steps on the whole dataset I’m training 20,000 steps on tiny subsets of the data. I’m not regularizing my input like we were taught, I’m using dropout instead as a way to avoid overfitting. And I’m also not positive I’m using the same exact cost and training functions. So lots of differences. But at least it’s the same class of network.


The resulting trained accuracy is about 96% ±0.4%. It takes about a minute to run.

Now that I understand this there are so many things to try.

  • Hidden nodes: more improves accuracy; 50 hidden nodes is about 96% and 100 hidden nodes is about 97%.
  • Activation function: why stick with sigmoid, I can plug in anything! I already tinkered with this inadvertently; I’m not sure if the bias parameter belongs in the sigmoid() or outside, either seems to work
  • Training optimizer. AdamOptimizer seems to converge faster but to a lower accuracy of 94.6%. For that matter I haven’t tried tuning the learning rate parameter.
  • Dropout probability. The sample code I cribbed from had this at 0.5; you really can train a network with randomly knocking out half its nodes? Wow. A setting that high seems to be hurt accuracy; I’m getting my best results around 0.1. Or even 0.0; maybe this stuff isn’t needed.

Retina dissection

There was a neat trick in Ng’s class where we visualized the