Multivariate linear regression, gradient descent

I’m taking Andrew Ng’s online Machine Learning course on Coursera. First time doing a MOOC for real, and on the fence about the learning style, but it is nice to have an organized class with weekly assignments.

Two weeks have gone by. The two weeks together sort of consist of one learning unit. You learn how to do linear regressions for datasets. What does that have to do with machine learning? Well, a linear regression model is a very simple form of predictive modelling for a dataset. “I fit this straight line to my 100 data points, then can use that line to predict values for arbitrary other inputs”.

The course is a bit schizophrenic about being math vs. computer programming. Ng’s lecture notes are entirely in terms of linear algebra, building up to result equations like

Screen Shot 2015-07-26 at 6.55.26 PM

(WTF? X is a matrix of your input feature set; m rows of n features each. y is an m row vector of expected feature outputs. Theta is an m row vector that is the coefficients of your linear regression prediction model. Alpha is the “learning rate”, a number that’s picked essentially by intuition. The assignment := is shorthand for iteration; we keep iteratively improving the theta vector until it converges.)

I hate linear algebra. Always did, ever since I was 19 years old and it was my 8AM class. It was the only math class I nearly failed, then crammed super hard the last week and got an A. Then promptly forgot it all. Happily, this class is also a programming class, and the actual exercises are “implement this function in Octave / Matlab”. So I get to turn that confusing math into simple code:

Screen Shot 2015-07-26 at 7.04.07 PM

While I’m a good programmer it’s been many years since I used a matrix programming language like Maple/Matlab/Octave/R. So getting to that function was hard-worn. I ended up implementing that by following Ng’s lecture progression. He starts with a simple single variable linear regression. I coded that using lots of loops so all the actual arithmetic was scalar operations. Then I tediously hand-translated all those loops into vector forms and generalized it to multivariable inputs. Good learning exercise both to remind me how linear algebra works and to learn the funky vagaries of Octave/Matlab execution. (TIL automatic broadcasting). It was gratifying to see how much faster the code ran in vector form!

Of course the funny thing about doing gradient descent for linear regression is that there’s a closed-form analytic solution. No iterative hillclimbing required, just use the equation and you’re done. But it’s nice to teach the optimization solution first because you can then apply gradient descent to all sorts of more complex functions which don’t have analytic solutions. If I end up getting to do genetic algorithms again I’m gonna be thrilled.

In the end I feel pretty proud of myself for completing week 2, doing all the optional extra work, and understanding it all. My long term goal here is just to understand enough about machine learning algorithms that I can stop worrying about how they are implemented, just bash about with someone else’s software libraries applied to my data. But it’s helpful to understand what’s going on under the hood.