# Machine learning: Neural networks introduction

Week four of my Coursera machine learning course was a breezy introduction to neural networks. The lecture videos were very high level but did a good job introducing the concept. The part I hadn’t understood before was how regression techniques are really best suited for linear prediction models, that building Nth order polynomials out of M features leads to O(N*M) work and badness. I also hadn’t really understood that neural networks are just a series of logistic regressions. The input variables are mapped through a logistic model to an intermediate hidden layer (of some chosen number of features), then the hidden layer is mapped again through a second logistic model to yield output variables. However the lecture stopped before we got to backpropagation, so for this week the method of training a neural network is still a mystery.

Logistic regression applied to OCR

The homework is a bit behind and out of sync with the lecture notes. The bulk of the work in the homework was still doing logistic regression, last week’s lecture concept. The hardest part was figuring out how to vectorize the naive loop implementation of the regularized logistic regression cost function we did last week. But I’d already vectorized it so I could just copy my solution from last week, gold star!

The more fun part was actually applying one of these learned models to do something useful with real data; OCR classification of handwritten numbers. The input was 5000 images, 20×20 greyscale pixel arrays, along with their classification (“this squiggle is the number 7”). Our job was to build a multiclass classifier to do the OCR, to predict a digit. So we took the regularized logistic regression cost function we just implemented and used fmincg() to search for the best parameters to match the data. The resulting output vector (theta) is our prediction model. Then we applied that learned model to classify input data. So I’ve now built a linear regression OCR system for handwritten numbers! The final system predicted the input set with 95% accuracy. The final model is quite large; 4010 separate integers. 401 weights for predicting each digit from 0–9, or one weight per pixel plus a constant term. Not exactly parsimony.

One neat thing about multiclass models is they don’t just output a predicted clas (“the number 7”), they also output a vector of probabilities for each possible value: “probability this image is the number 1, probability it is the number 2, …”. We crush those probabilities down to a single “this input is probably an image of the number 7”. But something to remember for later; machine learning models not only can return a prediction, but a confidence in that prediction. Or some ambiguity, I believe the math works such that a single image might have a 90% probability of being the number 7 and an 80% probability of being the number 9 (for a particularly ambiguous squiggle.)

Neural network forward propagation

The last part of the homework was implementing a basic neural network. Or rather the application of one, the forward propagation that maps the input data through the layers and gives outputs. We were handed parameters that had already been trained, so really this was just an exercise in “can you code up forward propagation?” Useful to do that myself though. In particular I had to puzzle out that the hidden layer consists of 25 nodes. So the final classifier is basically two steps. Logistic regression to map 400 pixels to 25 hidden nodes. And then a second logistic regression to map 25 hidden nodes to 10 probabilities. The central mystery of neural networks is what those “hidden nodes” really mean. And we have Deep Dream to thank for a lovely visualized expression of hidden states in a different kind of machine learning image processing system.