Optimizing zipdecode

Are there any good tools for profiling rich D3 webapps, testing the changes in Javascript, CSS, and SVG design for how much time it saves? The best I’ve come up with so far is inserting calls to Date.now() in my code and using console.log() to print numbers. It works, but it’s primitive.

Chrome’s Profile tool doesn’t help me much. The Javascript profiler does, actually, giving me detailed analysis of why something is taking so long. But that’s a forensics tool, what I need is a benchmark wrapper around what I’m doing, something like Python’s timeit. But I don’t want to measure just Javascript code; I’m more worried about the CSS selection time, the DOM manipulation time, the browser render time.

Anyway, my Zipdecode thing is too slow. I’d like to optimize it, in particular so the interaction on the user typing a zip code is faster (and ideally, animateable). My original code is intentionally naive: select all 40,000 dots and set their “fill” CSS style to a specific color. It’s bad:

        .style("fill", function(d, i) {
            if (l > 0 && d.zip.substr(0, l) == selectedZip) {
                return selectedColor;
            } else {
                return unselectedColor;

This take 140ms to do the update (and 650ms for the initial render).

Short circuiting the logic test about d.zip.substr() reveals that very little time is spent doing that. Replacing the call to .style() with a call to .each() that doesn’t manipulate the style drops the time down to 6ms; all my time is spent modifying the style, not creating the selection and iterating through it.

So the problem is mutating the .style(); no big surprise, but I can measure it! Let’s see how I can change the code to be better.

The obvious thing to try is to set the CSS class instead of a style property. The drawback with classes is there’s no way to interpolate the color easily. More on this in another blog post. But it works great, dropping the update time from 140ms to 25-30ms. Feels much faster. I tried this two ways

.attr("class", function(d) { return zipSelected(d.zip) ? "selected" : "unselected" })

.classed("selected", function(d) { return zipSelected(d.zip); })
.classed("unselected", function(d) { return ! zipSelected(d.zip); })

The first way using .attr() is 25-30ms; the second way calling .classed() twice is about 40ms, probably all from the overhead of that extra iterative function. I don’t think there’s anything particularly wrong about using .attr() to set classes, and since it’s simpler and faster I’ll stick with it.

A whole different optimization is treating to be smarter about the selection, not select all 40,000 dots every time. The code is not efficient in one important way; if I change the selection from “94” to “941” only 30 dots really need to be updated, but I go ahead and modify all 40,000 dots. But I don’t much care about fixing that because the case I’m most worried about is going from “” to “9”, where a lot of dots need to be touch (at most 1/10th of all zip codes, about 4000 dots).

But apparently I’m wrong. A version of this pull request from Ziggy Jonsson drops my interaction time from 140ms to 20ms, even less if the user is selecting a very narrow range of zips like “9414”. The key improvement is he’s only modifying the dots that are affected. (In addition he’s avoiding some calls to d3.select(), but I think that’s a smaller effect). Works well, but for my tutorial purposes I don’t like how it maintains this giant cached list of  all possible zip selections.

Another way to achieve the same effect would be  to use the hierarchy of the DOM to model the hierarchy of zip codes. Ie: instead of all the dots being top level <rect> elements, have a <g id=”9″> that contains all the 9xxxx zip codes (and in turn a <g id=”94″> inside that, etc. This would allow me to select with precision just the dots that need addressing, and might make the slower .style() method acceptable. But it significantly complicates the code and I don’t feel like doing that now that I’ve got stuff running pretty well. If I do it might be worth not going for the full hierarchical thing, just break zip code dots into the 10 top level categories.