My baby is starting to see! I built my first custom-designed neural network in TensorFlow and I’m happy. You can see my Python notebook here.
The fun thing about this is programming neural networks as a form of experimental science. There’s so many parameters to tweak, and the TensorFlow abstractions are so high level and complex. I’m not really sure my code is right. But I can just run an experiment, measure the accuracy, and if the result is good then maybe I did something right.
After doing my TensorFlow tutorials I decided to double back and re-implement my work from Ng’s Coursera course, ex4, which had us implementing backpropagation by hand and then creating a neural network that can recognize handwritten digits from MNIST. I liked this exercise back in Ng’s course because it felt like a real task and had a hidden surprise, the visualization of the feature layer. So time to try again!
The Deep MNIST for Experts tutorial from TensorFlow does this task for you, but with a pretty complex neural network. I decided to clone Ng’s network as closely as possible. To wit: a single hidden layer of 25 nodes using a sigmoid() activation function, yielding 95.3% accuracy.
Turns out that’s not entirely easy to replicate the initial experiment. Ng’s input data was 20×20 images and TensorFlow has 28×28 inputs. Instead of training 400 steps on the whole dataset I’m training 20,000 steps on tiny subsets of the data. I’m not regularizing my input like we were taught, I’m using dropout instead as a way to avoid overfitting. And I’m also not positive I’m using the same exact cost and training functions. So lots of differences. But at least it’s the same class of network.
The resulting trained accuracy is about 96% ±0.4%. It takes about a minute to run.
Now that I understand this there are so many things to try.
- Hidden nodes: more improves accuracy; 50 hidden nodes is about 96% and 100 hidden nodes is about 97%.
- Activation function: why stick with sigmoid, I can plug in anything! I already tinkered with this inadvertently; I’m not sure if the bias parameter belongs in the sigmoid() or outside, either seems to work
- Training optimizer. AdamOptimizer seems to converge faster but to a lower accuracy of 94.6%. For that matter I haven’t tried tuning the learning rate parameter.
- Dropout probability. The sample code I cribbed from had this at 0.5; you really can train a network with randomly knocking out half its nodes? Wow. A setting that high seems to be hurt accuracy; I’m getting my best results around 0.1. Or even 0.0; maybe this stuff isn’t needed.
There was a neat trick in Ng’s class where we visualized the hidden layer of our neural network to get some insight into how the classifier was doing its thing. Here’s an image from that exercise. Inline below is the same kind of image from my new network.
It’s qualitatively different, I think. So many of the features look like hands on a clock; identifying line segments in numbers maybe? I don’t know what to think of this. My old image looks way more random, I wonder if it was overfit in a way this new one isn’t.
One thing I learned doing this; if I allow 100 hidden nodes instead of just 25, a lot of the hidden nodes look qualitatively the same in the visualization. If they’re mostly identical does that mean they are redundant? Unnecessary?
I also took a crack at visualizing the hidden nodes that contributed them most to identifying each image. Here’s the top 5 nodes for the numbers 0 and 1
Again, not really sure what to make of this. Particularly since the most important node for both numbers is the same! I think I’m sorting by overall positive contribution, not absolute value. I’m not considering bias terms though.
Anyway, I feel like I know how to set up a basic neural network in TensorFlow now. Lots of stumbling around and cargo cult programming. But the ability to evaluate training accuracy is a strong external check on whether your code is working OK. What it doesn’t tell you is if it’s working great.