Machine learning: Logistic regression

Just finished week 3 of Andrew Ng’s machine learning course on Coursera. I’m going to try to blog each week summarizing what I learned.

This week’s topic is logistic regression; predicting discrete outcomes like “success or failure” from numeric data inputs. Ie: “our diagnostics measure these 4 numbers for a tumor. Is it cancerous or benign?” Turns out logistic regression is basically just linear regression where the output set is restricted to the interval [0, 1]. Normal linear regression gives all real numbers on output, so you pass that through the sigmoid curve to bound the result to [0, 1]. That number is effectively “the probability the output is 0”, you can then threshold it at 0.5 to map it to a strictly binary “success or failure” output. There’s a bunch of math then to define the cost function and the partial derivatives of the cost function which you can then use with an optimization algorithm like gradient descent.

Logistic regression only classifies data into two classes. If you want N classes you do logistic prediction many times, ie: “is it A or not A? is it B or not B? is it C or not C?” and then pick the class ABC with the highest probability.

The predictor you get from the logistic regression is best understood in terms of a decision boundary, a drawing of the threshold that tips an input from success to failure. My final homework assignment was learning this threshold boundary to separate plusses from squares where the input data is 2 dimensional. The green line is the learned threshold boundary, some sixth order polynomial the magic optimizer found for me.

Screen Shot 2015-08-02 at 11.02.08 AM

Thankfully the course moved on a bit from math into “use the black box” by introducing fminunc, a black-box minimizer provided by Matlab/Octave. All it needs is the cost function and its partial derivatives and it does who-knows-what to find the minimum. Frankly I wish the class spent more time on how to use fminunc well and less time deriving the gradient descent solutions that I’ll never use again. But gradient descent is a nice simple optimizer and it is good to know how at least one works.

This week also introduced the concept of overfitting, of specializing your predictor too tightly to the data. Regularization was the solution provided for overfitting; basically biasing the cost function to prefer small values of theta, the learning model parameters we’re optimizing. Ng doesn’t really explain why small values of theta are “better” other than presenting some intuition that a theta_i of 0 means the parameter is fully ignored and a theta_i of 1000 means it’s way too overemphasized. The regularization parameter lambda is picked out of a hat much like the learning rate alpha is also picked for gradient descent.

Along the way I learned how to define anonymous functions in Octave which then gives you a really easy way to curry a partial function application (in this case, currying X and y into my costFunction so that fminunc only works on the parameter t.

fminunc(@(t)(costFunction(t, X, y)), initial_theta, options)

I continue to be amazed that I type vector equations into Octave and they Just Work on first try, despite my being rusty on linear algebra not to mention matrix programming languages. I’m kind of just accepting that if the homework grader says I got it right I’m done.

Some metacomments about the course… I read something somewhere that characterized this as an advanced undergraduate class, maybe sophomore or junior level. That feels about right and explains why it seems a little too easy. But easy in the right ways for me, I really don’t want to do the math derivations. Also I discovered a lot of previous students’ homework checked in on GitHub. Cheating would be stupid, who would I cheat but myself? But it’s nice to be able to look what other students did. Helped me verify arrayfun() was really the right thing to apply a scalar function to a matrix, for instance.

Now I feel like I’ve studied enough machine learning to apply it to a problem I care about. I intend to look at League of Legends match results, to see whether I can predict a game is a win or a loss based on match performance. The simplest thing is to look at end of game stats like kills or gold earned. Of course end of game I already know for sure if it was a win or a loss, but I’m thinking I can read the learned model parameters out to see how significant a contribution those inputs like kills are to whether a team wins. Alternately I can get ahold of some mid-game stats like “gold earned after 20 minutes”, whether that results in a win or a loss is a bit more of a legitimate prediction problem.