Centroid street addresses considered harmful

Had a precise example of a mapping error in Sydney, Australia for my AirBnB located at 3/239 Victoria St in Darlinghurst, Sydney, NSW, Australia.


Great location in the middle of a fun central neighborhood. The front door is on the east side of the building, on Victoria St itself. But that map pin there is in the middle of the building, the centroid of the rectangle. And that means a lot of map software guesses the address is just a bit closer to the west side of the building, to that little Hayden Ln. Which is a back alley you can’t really drive in and isn’t accessible from the apartment.

This caused real problems with Uber. Their routing software would sometimes try to send the driver up to the back alley.Drivers are smart enough not to do that, but following the purple line put them on Liverpool St where they can’t turn left on to Victoria St to drive to the actual door. This detail mattered because one of us was injured and couldn’t walk the half block down to Liverpool St. Also the routing was unstable; sometimes it directed drivers to the east side on Victoria St and then would flip mid-drive to the west side on Hayden Ln. Which changes the whole route by half a kilometer because Victoria St is one way.

We have the official address point data from the Australian government on OpenAddresses. They geocode the address as 151.2212399,-33.8779683 which is just a hair further southwest than the Google Maps pin. That’s the official correct location for the address but it does not describe the ground truth. Bing Maps has more or less the same wrong point as Google, so does Apple. OpenStreetMap doesn’t have the street number at all and guesses the wrong segment of the road.

The underlying problem here is the database has the polygon for the building but not the exact point of the front door. So it guesses a point by filling in the centroid of the polygon. Which is kinda close but not close enough. A better heuristic may be “center of the polyline that faces the matching street”. That’s also going to be wrong sometimes, but less often.

One extra wrinkle: our proper address was “3/239 Victoria St Darlinghurst”. It’s not enough to say Sydney, you have to name the actual suburb Darlinghurst, because there are multiple Victoria Streets. And the 3/ is an essential part of the address, naming the apartment unit number. Various official forms won’t just accept “239 Victoria St”.

PS: Gelato Messina is excellent.

Update: a friend at Mapbox tells me they do something similar to my heuristic. It’s implemented in this library and is in process to be added to their geocoder. He describes the algorithm as “drawing a bisecting line from the address point to the closest point on the street line”. I imagine the details are more complicated.

Hawaiian Homelands revisited

At Kiwi Foo I met someone who was interested in the Hawaiian Homelands, so I revised the map I made and got it working again. Refreshed the data too, the state published a new file from the 2015 census that includes population information. The new slippy map is here. I also made a rough GeoJSON view using geojson.io; just the converted state data, no ahupuaʻa or interpretive mapping.

From a technical point of view the main thing I had to do was convert the map from using Mapzen (RIP) to vanilla Leaflet. Not too hard since Mapzen’s Javascript was based on Leaflet, but I’m lacking a good replacement for their geocoder search. I replaced their map tiles with Carto’s Positron. Here’s a good list of free map tiles that I found that on. I probably should have just done the whole thing again in Mapbox GL JS but that was more work than just porting from Mapzen to Leaflet. Also Mapbox isn’t a free service.

I didn’t do more on this project before because I don’t understand the history and importance of the Hawaiian Homelands enough to really do it right. But the guy I met has a lot more knowledge and knows some folks, we may work together to do more. A good static map for Wikipedia seems valuable, something simple like the state’s preview. I’d love to get more specific data about every individual parcel, there’s only 75 and they must have interesting histories. Maybe turn that research into a magazine article or something.


Windows newlines vs Unix bash

I was having the weirdest problem debugging a shell script. “bash -x” was showing stuff like this:

++ mktemp
+ T=$'/tmp/tmp.FJZp7VfwsA\r'
+ ogr2ogr -f geojson $'/tmp/tmp.FJZp7VfwsA\r' $'../hhl15/hhl15.shp\r'

I was using mktemp to create a file. Why was bash showing it as $’filename\r’?

Turns out, derp, SublimeText on my Windows box create the file. With Windows newlines by default. Which bash will treat as important whitespace and not strip. So if you ever see a $’\r’ in your bash -x output, that’s your damage.


Notes on Javascript cryptocurrency mining

The new hotness is web sites using visitors’ CPU to mine cryptocurrency. Typically without consent, often via malware, although sometimes as a disclosed alternative to ads. Fuck everything about that.

Still I was curious how this works. Mining most cryptocurrencies requires speciality hardware to be worth the bother. GPUs, at least, and in the case of Bitcoin ASICs. So I’ve been wondering how a Javascript miner would work, surely it’s 100-1000x slower than a native GPU program? Are they using WebGL?

The most popular solution for Javascript mining now is CoinHive. And they mine the Monero currency. Why?  Explicitly for performance reasons. The Monero hash (Cryptonight) is something designed for CPU computing and doesn’t really run better on GPUs. So it’s a reasonable thing to do in Javascript.

BTW I found this CoinHive demo useful for playing around with a Javascript miner in my own browser. Yup, it takes 100% of all 8 of my CPU threads very easily.

(Related: the NoCoin extension is a way to protect your browser from this crap. Ad blockers will typically block them too (uBlock Origin does), but I want to see the icon specifically about whether some website is running a miner.)

Python 3 benchmarks

Discussion on Reddit led me to this official Python benchmark site. If you fiddle the settings you can get a comparison of how various Python 3 distributions compare to Python 2 on a bunch of benchmarks.

Screenshot-2018-2-12 Python Speed Center Comparison.png

Bars below 1.0 mean Python 3 is faster than Python 2. The different colors are different point releases of Python 3. The broad picture here is that Python 3 is generally about as fast as Python 2 or a little better. The big slower in the middle is Python startup; that’s 2-3x slower now. No other obvious pattern to me.

I’d had in my head Python 3 was generally about 20% slower. Partly because it does Unicode properly now, partly because some of the switch from collections to iterables in the core tuple and list types added slowness. But that opinion is not born out by this data.

PS: this screenshot brought to you by Firefox’s awesome screenshot tool. Not sure if it’s new to everyone or just me, but it makes saving an image of a DOM chunk of a page very easy.

Why does CJK software have such ugly English text?

There’s a distinct style of typesetting in Japanese software, particularly videogames, where the English text looks terrible. Like they use the same two fonts (one serif, one sans) from 1982 and they’re typeset wrong. Even in new software, like the brand new Monster Hunter World game. Chinese and Korean software often has the same problem. Why does CJK software do such a bad job with English text?


I found some sources online and they describe several kinds of problems:

  1. Font availability. Your Japanese (or Chinese, or Korean) computer won’t have many fonts that support both your language and Roman characters. So you use the ones that are there. They look fine in your language so you don’t care much if they look awful in Roman. MS Mincho or SimSun for example. It’s a bit like how so much stuff is done in Arial or Microsoft’s Times New Roman. They aren’t great, but they are present.
  2. Typesetting ascenders and descenders. The way Roman characters have a middle weight and then go above that (say the letter d) or below that (p) is a distinctive aspect of American font design. CJK characters don’t do that, they have a totally different shape. Descenders in particular often get squeezed in Japanese fonts for Roman characters.
  3. Mismatched aesthetics. Roman fonts have Serif and Sans-Serif fonts. Japanese has Mincho and Gothic. But while Mincho fonts often make the Roman characters have serifs, there’s no real commonality in design there at all.
  4. Halfwidth Roman characters. Old computers used fixed width character displays. Typography pretty much always looks awful this way. But on top of it in a CJK writing system most characters use a full width cell but it’s two wide for Roman letters, so you squeeze in two half-width characters instead.

None of these issues prevent a Japanese or Chinese or Korean company from producing excellent English typesetting. But if you’re used to seeing badly typeset Roman characters all the time in your daily computer work, it won’t stand out at you so badly when someone is finally localizing your product to America or Europe and they start translating the menus in the fastest, cheapest way. At least that’s my theory.

Some further reading:

Dexie is good software

I’m really glad I chose to use Dexie as my interface for IndexedDb storage in browsers.

Mostly it’s just really professionally packaged. The docs are great. The basic API is very simple to use. Where things get complicated they make right choices. Like bulkAdd() will abort if you’re inside a transaction and try to add on top of some duplicate keys unless you explicitly override that. But outside of a transaction it’ll just do its best to add data that doesn’t conflict and log a warning.

It also has nice support for schema migration. I haven’t stressed this too hard, but adding new columns works nicely and transparently for users. It has simple support for writing custom migration functions, too.

Dexie supports some fairly complex interactions with the database. All I’ve had to do is simple things and I appreciate that simple things are simple. But it looks good for doing complicated things, too.