TensorFlow introduction

I spent a couple of hours kicking the tires on TensorFlow, mostly working through the Getting Started docs. Which are excellent, btw. Here’s some things I learned. These are all super basic, undergrad level things. The real stuff requires more time to get to.

  • Installing TensorFlow is as easy as “pip install tensorflow”. It runs fine in Jupyter with no problems.
  • Don’t be dumb like me and try to get the GPU accelerated version working at first; that’s hard because NVidia’s packaging is such a mess. For that matter ignore the warnings about CPU optimizations for your hardware. That might make it run 2-3x faster, but you have to compile TensorFlow yourself to do that.
  • Tensor” is a fancy word for “multi-dimensional array”. It’s numpy under the hood. TensorFlow is all about creating a flow of data through tensors.
  • “Tensor” is also a fancy word for “deferred computation”. The TensorFlow framework basically has you creating a bunch of Futures/Promises and linking them together in a model. You don’t run code imperatively, you declare function objects (of type Tensor) and then hand them to a session manager to run. This enables two important kinds of magic:
  • Magic 1: the session runner handles running your model. You don’t really care how. Maybe it’s run on the CPU, maybe the GPU, maybe it’s handed off to a distributed compute cluster. All you know is you told the session to run and you got a result.
  • Magic 2: when training a model, the derivatives are calculated via automatic differentiation. Most machine learning techniques require that you not only calculate the error between the model’s output and the desired output, but also the first partial derivatives of that partial error. You can then use the derivative for gradient descent optimization, etc. A big part of what makes machine learning algorithms mathematically difficult is analytically finding those derivative functions. You can numerically approximate the derivative but that doesn’t work very well. TensorFlow instead automatically generates derivatives by inspecting the model you created out of Tensor building blocks and functions. (See also Google Tangent, a different approach to automatic differentiation done by decompiling Python code. wacky!)
  • You can write your own training system by using the tf.train API to create a model and pass it to your optimizer of choice. Or you can get fancier and use the tf.estimator API to run a whole machine learning project for you. That’s most of what the “getting started” tutorial has you do, those two approaches.
  • The trained models become things you keep in TensorFlow; you can store them to disk, apply them to input data, etc.
  • There’s a nifty tool called TensorBoard that can visualize a TensorFlow model, all the functions it is built out of.  There’s also visualizations of the training process.

At the end of the tutorial all I’d done was train a very simple linear regression model to some toy one dimensional data. But I sort of understand how the parts fit together now. I’m impressed with how well crafted those parts are, Google has put a whole lot of effort into packaging and presenting TensorFlow so folks like us can use it. It’s impressive.

The next step in the tutorials is to train a simple handwriting recognizer. I did that from scratch in Ng’s course, will be fun to revisit it with a high level toolkit.