Snowfall hillshades

Garrett Dash Nelson has a lovely visualization of snowfall for 2017-2018; take a look! His code is very simple. A Python script to download data from NOAA, then some bits of shell scripts using GDAL to reproject, hill shade, and convert to an animated GIF.

I wanted to see what this treatment would look like over years so I did my own little thing based on his. Here’s the image processing script I used

set -eux
for y in 2014 2015 2016 2017 2018; do
  gdalwarp -t_srs "EPSG:5069" -srcnodata "-99999" sfav2*to_$y*.tif $d-warped.tif
  gdaldem hillshade -z 1500 $d-warped.tif $d-hillshade.tif
  gdal_translate -of JPEG $d-hillshade.tif $d-hillshade.jpg

And the images below. Starting at 2013-2014, and the final image is just a partial year of 2017-today. It’s not a great result TBH; the years end up looking pretty much the same and not that different from an altitude graph. In California you can’t see the low snowfall years of 2014-2015 and 2015-2016 vs the high snowfall 2016-2017, for instance.






What do sites used IndexedDB for?

Some notes on a survey of what all I find in IndexedDB storage in Firefox. That’s something like cookies, a way for a website to store data on your computer, only in this case it’s a fancy NoSQL database. I’m using it myself and have found Firefox’s Storage Inspector unreliable and limited, so I built my own tool to look through my profile directory and tell me what it finds.

Example report from the tool:            dm_typeahead               20160920      98304
  conversations title
  metadata -
  users name_lowercase, screen_name ONE_SIGNAL_SDK_DB                 1      49152
  Ids -
  NotificationOpened -
  Options -    test                              1      49152

That tells me

  • Twitter has a database called “dm_typeahead” with three tables; conversations, metadata, and users. Users has two columns, name_lowercase and screen_name. It has a version string of “20160920” and was 98304 bytes big when last vacuumed.
  • Smithsonian is using some SDK to create some tables about notifications, but they contain no columns at all.
  • The Guardian created a database named “test” with no tables at all.

So what’d I find?

  • A bunch of empty “test” databases,  presumably testing the browser can do IndexedDB at all. This may be for detecting if the browser is in private mode.
  • A bunch of sites use One Signal, which I guess manages those horrible HTML5 website notifications that spam pop ups. I’m religious about never allowing that, which is probably why I have no data.
  • Several sites using Augur, a web tracking system.
  • Things called LPSecureStorage and fibet; trackers?
  • seems to be storing PC emulator state
  • Amazon is caching a lot of data about books, maybe for Kindle Cloud?
  • Twitter has extensive app-specific usage
  • WordPress’ Calypso
  •, a pretty spammy site, has a database named J7bhwj9e with some user tracking stats.
  • has a bunch of databases named next:ads-v1 and next:image-v1 and the like
  • has something called workbox.

The tool I built is in this gist.


I hate 2018 favicons

I just added a favicon to a new web app I’m building. Check out this 1500 bytes of boilerplate I just added to every page:

<link rel="apple-touch-icon-precomposed" sizes="57x57" href="/images/favicon/apple-touch-icon-57x57.png" />
<link rel="apple-touch-icon-precomposed" sizes="114x114" href="/images/favicon/apple-touch-icon-114x114.png" />
<link rel="apple-touch-icon-precomposed" sizes="72x72" href="/images/favicon/apple-touch-icon-72x72.png" />
<link rel="apple-touch-icon-precomposed" sizes="144x144" href="/images/favicon/apple-touch-icon-144x144.png" />
<link rel="apple-touch-icon-precomposed" sizes="60x60" href="/images/favicon/apple-touch-icon-60x60.png" />
<link rel="apple-touch-icon-precomposed" sizes="120x120" href="/images/favicon/apple-touch-icon-120x120.png" />
<link rel="apple-touch-icon-precomposed" sizes="76x76" href="/images/favicon/apple-touch-icon-76x76.png" />
<link rel="apple-touch-icon-precomposed" sizes="152x152" href="/images/favicon/apple-touch-icon-152x152.png" />
<link rel="icon" type="image/png" href="/images/favicon/favicon-196x196.png" sizes="196x196" />
<link rel="icon" type="image/png" href="/images/favicon/favicon-96x96.png" sizes="96x96" />
<link rel="icon" type="image/png" href="/images/favicon/favicon-32x32.png" sizes="32x32" />
<link rel="icon" type="image/png" href="/images/favicon/favicon-16x16.png" sizes="16x16" />
<link rel="icon" type="image/png" href="/images/favicon/favicon-128.png" sizes="128x128" />
<meta name="msapplication-TileColor" content="#FFFFFF" />
<meta name="msapplication-TileImage" content="/images/favicon/mstile-144x144.png" />
<meta name="msapplication-square70x70logo" content="/images/favicon/mstile-70x70.png" />
<meta name="msapplication-square150x150logo" content="/images/favicon/mstile-150x150.png" />
<meta name="msapplication-wide310x150logo" content="/images/favicon/mstile-310x150.png" />
<meta name="msapplication-square310x310logo" content="/images/favicon/mstile-310x310.png" />

How awesome is that! And I have no idea if it’s correct and no practical way to test it. I’m trusting Favic-o-Matic here. I tried reading docs about what to do online but every single website says something different. And who knows; maybe Apple will innovate with a new 79×79 size next week. (To be fair, Favic-o-Matic does offer the option to have fewer sizes; 16 / 32 / 144 / 152 is the minimal set.)

The original favicon standard wasn’t so bad. Nothing in the HTML at all, and a single /favicon.ico file in your root directory. That format was weird and semi-proprietary but it had the advantage it could hold multiple resolutions in a single file. Simple and done.

Then Apple screwed it up by starting to fetch random weird URLs on the website for its precious iOS icons. Then webmasters complained and so this linking standard started. Apple went overboard in supporting every single possible pixel-perfect resolution. Then Microsoft decided that was neat and added their own new incompatible formats for the stupid Start menu tiles no one uses anyway. And here we are.

Really want I want is to publish a single reasonable image, maybe 256×256, and just let the desktop clients auto-scale them. Yeah it won’t be pixel perfect but it’s not like I’m redrawing these icons at every size anyway. Either that or modernize the old favicon.ico idea so a single file has all the icons. A zip container would do nicely.

Porn mode vs IndexedDB

I’m fond of testing my webapps in porn mode (aka incognito mode, private browsing, etc.) It’s a very convenient way to test a webapp starting from a blank slate.

Only, IndexedDB doesn’t work in private mode in any browser but Chrome. This breaks Dexie too. In Firefox you get an error

InvalidStateError A mutation operation was attempted on a database that did not allow mutations.

That’s too bad. It does work in Chrome; it acts like they store the database but then wipe it when the private session ends.

TensorFlow MNIST sigmoid recognizer

My baby is starting to see! I built my first custom-designed neural network in TensorFlow and I’m happy. You can see my Python notebook here.

The fun thing about this is programming neural networks as a form of experimental science. There’s so many parameters to tweak, and the TensorFlow abstractions are so high level and complex. I’m not really sure my code is right. But I can just run an experiment, measure the accuracy, and if the result is good then maybe I did something right.


After doing my TensorFlow tutorials I decided to double back and re-implement my work from Ng’s Coursera course, ex4, which had us implementing backpropagation by hand and then creating a neural network that can recognize handwritten digits from MNIST. I liked this exercise back in Ng’s course because it felt like a real task and had a hidden surprise, the visualization of the feature layer. So time to try again!

The Deep MNIST for Experts tutorial from TensorFlow does this task for you, but with a pretty complex neural network. I decided to clone Ng’s network as closely as possible. To wit: a single hidden layer of 25 nodes using a sigmoid() activation function, yielding 95.3% accuracy.

Turns out that’s not entirely easy to replicate the initial experiment. Ng’s input data was 20×20 images and TensorFlow has 28×28 inputs. Instead of training 400 steps on the whole dataset I’m training 20,000 steps on tiny subsets of the data. I’m not regularizing my input like we were taught, I’m using dropout instead as a way to avoid overfitting. And I’m also not positive I’m using the same exact cost and training functions. So lots of differences. But at least it’s the same class of network.


The resulting trained accuracy is about 96% ±0.4%. It takes about a minute to run.

Now that I understand this there are so many things to try.

  • Hidden nodes: more improves accuracy; 50 hidden nodes is about 96% and 100 hidden nodes is about 97%.
  • Activation function: why stick with sigmoid, I can plug in anything! I already tinkered with this inadvertently; I’m not sure if the bias parameter belongs in the sigmoid() or outside, either seems to work
  • Training optimizer. AdamOptimizer seems to converge faster but to a lower accuracy of 94.6%. For that matter I haven’t tried tuning the learning rate parameter.
  • Dropout probability. The sample code I cribbed from had this at 0.5; you really can train a network with randomly knocking out half its nodes? Wow. A setting that high seems to be hurt accuracy; I’m getting my best results around 0.1. Or even 0.0; maybe this stuff isn’t needed.

Retina dissection

There was a neat trick in Ng’s class where we visualized the hidden layer of our neural network to get some insight into how the classifier was doing its thing. Here’s an image from that exercise. Inline below is the same kind of image from my new network.


It’s qualitatively different, I think. So many of the features look like hands on a clock; identifying line segments in numbers maybe? I don’t know what to think of this. My old image looks way more random, I wonder if it was overfit in a way this new one isn’t.

One thing I learned doing this; if I allow 100 hidden nodes instead of just 25, a lot of the hidden nodes look qualitatively the same in the visualization. If they’re mostly identical does that mean they are redundant? Unnecessary?

I also took a crack at visualizing the hidden nodes that contributed them most to identifying each image. Here’s the top 5 nodes for the numbers 0 and 1


Again, not really sure what to make of this. Particularly since the most important node for both numbers is the same! I think I’m sorting by overall positive contribution, not absolute value. I’m not considering bias terms though.

Anyway, I feel like I know how to set up a basic neural network in TensorFlow now. Lots of stumbling around and cargo cult programming. But the ability to evaluate training accuracy is a strong external check on whether your code is working OK. What it doesn’t tell you is if it’s working great.

TensorFlow optimized builds

tl;dr: install these TensorFlow binaries for a 2-3x speedup.

Update: or not; turns out the AVX binaries probably only are 10% faster. See below.

I’m now running TensorFlow programs slow enough that I care about optimization. There’s several options here for optimized binaries:

  1. Stock TensorFlow
  2. TensorFlow recompiled to use Intel CPU parallel instructions like SSE and AVX. See also the warning stock TensorFlow gives:
    tensorflow/core/platform/] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX
  3. TensorFlow with the GPU

I’m trying to get from 1 to 2; from what I’ve read it’s a 2-3x speedup. GPU is even better of course but is a lot more complicated to set up. And the Linux box I do my work on doesn’t even have a GPU (although my Windows desktop does).

I’m testing this all with a simple hidden sigmoid layer neural network and Adam’s Optimizer, training to recognize MNIST data.

I tried building TensorFlow from source and quit pretty quickly. It requires bazel to build, which in turn requires a Java runtime, and I noped out. Probably could get it working with a couple of hours’ time.

I tried Intel’s optimized TensorFlow binaries. These seem not to be build with AVX; I still get the warning. They are also slower, my little program took 210s to run instead of 120s. Reading their blog post it sounds like this is mostly Intel’s crack optimization team reordering code so it runs more efficiently on their CPUs. (Intel has an amazing group of people who do this.) Also the patches were submitted back to Google and are probably in stock TensorFlow. Not sure why it’s slower, and I’m bummed they didn’t build with AVX, but here we are.

lakshayg’s binaries. No idea who this guy is but sure, I’ll try a random binary from anyone! Bingo! My program goes from 120s to 46s, or a 2.6x speedup. Hooray! (But see below). One slight caveat; this is 1.4.0rc1, not the latest 1.4.1. There’s about two weeks worth of bug fixes missing.

TinyMind’s Tensorflow wheels are another source of precompiled Linux versions of Tensorflow. They’re built with AVX2 which unfortunately my processor doesn’t support.

Starting with 1.6 Google is going to release AVX binaries only. This breaks older CPUs, shame they can’t release several different binaries.

Update: I’ve noticed the performance isn’t stable.  With the AVX binaries my program runs sometimes in 46 seconds (yay!) and sometimes in 110 seconds (boo!). With Google’s stock build it’s sometimes 51 and sometimes 120. That suggests the AVX binaries aren’t a significant speedup for my program and I have a deeper mystery.

I spent several hours figuring this out. Turns out in the slow case, my program spends most of its time in mnist.next_batch(), I think when it runs out of data and has to reshuffle. I have no idea why it’s so variable or slow but it’s not an interesting failure given this is tutorial code. Does remind me I should learn more about how to manage test data correctly in TensorFlow.

If I stub out the batching so it’s not a factor my program runs in about 29s with the AVX binaries, 32s with stock binaries (no AVX). So maybe a 10% improvement. That’s not very exciting.

TensorFlow day 2

Some more tinkering with TensorFlow, in particular the MNIST for ML Beginners and Deep MNIST for Experts tutorials. MNIST is neat; it’s a standard normalized dataset of handwriting samples for the numbers 0-9. A classic for machine vision testing, with well known results and training accuracies of 88 – 99.5% depending on the approach. Consensus test data like this is so valuable in a research community. I worked with this dataset back in Ng’s Machine Learning class.

First up, MNIST for ML Beginners. It has you build a basic linear regression model to classify the numbers, then train it. Final accuracy is about 92%.

I followed this just fine, it’s pretty straightforward and not too different from the “getting started” tutorial. Just on real data (MNIST) and using some slightly more sophisticated functions like softmax and cross_entropy. Some notes:

  • TensorFlow has datasets built in, in the tensorflow.examples package.
  • The MNIST data set has a “.train” collection of training data and a (presumably disjoint) “.test” collection for final test data. The .train set also has a method .next_batch() which lets you randomly subsample rather than training on all data every single iteration.
  • The concept of “hot ones” representation. For labeling the digits 0-9 we have an array of 10 numbers (one per digit). Every number is 0 except for a single 1, which marks the label. There’s also the “tf.argmax()” function for quickly finding index of the column set to 1.
  • The softmax function which takes a vector of weights and normalizes it so it becomes a vector of probabilities that sum to 1. The weighting is exponential.
  • TensorFlow has an InteractiveSession which lets you mix declaring stuff with running session code conveniently. Good for noodling in a notebook.
  • “Loss functions”, basically a measure of the error between a prediction your model makes and the expected result data. These tutorials use the cross_entropy function, an information theory calculation that involves the probabilities of each outcome as well as just measuring the error.
  • tf.train.GradientDescentOptimizer() is a simple optimizer we apply here in a straightforward way. Note this is where TensorFlow’s automated differentiation comes into play, to do the gradient descent.

The second tutorial I did was Deep MNIST for Experts. This has you building a 4 layer neural network (aka “deep”) that maps 5×5 patches of the image to 32, then 64 features, then convolves it all to a single flat 1024 features before classifying it. Final accuracy is about 99.2%.

I had a harder time following this, it assumes a lot more machine learning knowledge than the previous tutorials. If you don’t know things like what a rectified linear neural network, what dropout does, or what the Adam Optimizer is you’re gonna be a bit lost. It me; I’m kind of blindly copying stuff in as I go.

  • The full source has this weird thing about name_scope in the code. I think this is an extra level of testing / type checking but I’m not positive. I left it out and my code seems to have worked.
  • This code gets a bit complicated because you’re working with rank 4 tensors, ie: one giant 4 dimensional array. The first dimension is test image #, the second and third are pixels (in a 28×28 square) and the fourth is a single column for color value. It’s a standard setup for 2d image processing, I imagine.
  • The network structure is neat. Intuitively you boil down 28×28 grey pixel values into 14×14 32 dimensional values. Then you boil that down again to 7×7 64 dimensional values, and finally to a single 1024 feature array. I’m fascinated to know more about these intermediate representations. What are those 1024 features? I expect one is “looks like a vertical line” and one is “looks like a circle at the top” and the like, but who knows. (I bet someone does.)
  • The pooling from 28×28 → 14×14 → 7×7 is odd to me. It uses max_pool, which I think means it just takes the maximum value from a 2×2 window. Surprised that blunt an instrument doesn’t throw things off. For that matter what does a derivative of this function mean?
  • Dropout sounds crazy; you randomly just drop nodes from the neural network during the training. This keeps the network honest, avoids overfitting. It feels a bit like randomly harassing someone while they’re studying to keep them on their toes. The paper they linked says Dropout is an alternative to regularization. I note this code doesn’t ever regularize its input, so I guess it works?
  • They also introduce the idea of initial weights in a neural network. I remember this from Ng’s course; you want them to not all be 0, because then nothing can break the symmetry. Also they give everything a positive bias term to avoid “dead neurons”. Not sure what that means.
  • The pluggable nature of Tensor modules is apparent here. Particularly the swap to the “Adam Optimizer” over a simple gradient descent. I have no idea what this algorithm does but using it is literally one line of code change. And presumably it’s better, or so the linked paper claims.
  • It’s slow! 20,000 training iterations on a i7-2600K is taking ~20 minutes. Now I wish I had the custom compiled AVX version, or a GPU hooked up :-) It is running as many threads as it should at least (7 or 8).
  • They have you running 20,000 training iterations but the accuracy measured against the training set converges to 0.99 by around 4000 iterations. I wonder how much the network is really changing at that point. There’s a lot of random jitter in the system with the dropouts and sampling, so there’s room. The accuracy against the test set keeps improving up to about 14,000 steps.

One thing these tutorials are missing is more visual feedback as you go along. That, and some easy way to actually use the model you’ve spent an hour building and training.

I’d like to go back and implement the actual neural network I built for MNIST for Ng’s class. IIRC it’s just 1 hidden layer. the 20×20 pixels are treated as a linear array of 400 numbers, then squashed via sigmoid functions to a hidden layer of 25 features, then squashed again to a hot ones layer of 10 numbers. It would be a good exercise to redo this in TensorFlow. The course notes describe the network in detail and suggest you expect about a 95.3% accuracy after training.