Hawaiian Homelands revisited

At Kiwi Foo I met someone who was interested in the Hawaiian Homelands, so I revised the map I made and got it working again. Refreshed the data too, the state published a new file from the 2015 census that includes population information. The new slippy map is here. I also made a rough GeoJSON view using geojson.io; just the converted state data, no ahupuaʻa or interpretive mapping.

From a technical point of view the main thing I had to do was convert the map from using Mapzen (RIP) to vanilla Leaflet. Not too hard since Mapzen’s Javascript was based on Leaflet, but I’m lacking a good replacement for their geocoder search. I replaced their map tiles with Carto’s Positron. Here’s a good list of free map tiles that I found that on. I probably should have just done the whole thing again in Mapbox GL JS but that was more work than just porting from Mapzen to Leaflet. Also Mapbox isn’t a free service.

I didn’t do more on this project before because I don’t understand the history and importance of the Hawaiian Homelands enough to really do it right. But the guy I met has a lot more knowledge and knows some folks, we may work together to do more. A good static map for Wikipedia seems valuable, something simple like the state’s preview. I’d love to get more specific data about every individual parcel, there’s only 75 and they must have interesting histories. Maybe turn that research into a magazine article or something.

 

Windows newlines vs Unix bash

I was having the weirdest problem debugging a shell script. “bash -x” was showing stuff like this:

++ mktemp
+ T=$'/tmp/tmp.FJZp7VfwsA\r'
+ ogr2ogr -f geojson $'/tmp/tmp.FJZp7VfwsA\r' $'../hhl15/hhl15.shp\r'

I was using mktemp to create a file. Why was bash showing it as $’filename\r’?

Turns out, derp, SublimeText on my Windows box create the file. With Windows newlines by default. Which bash will treat as important whitespace and not strip. So if you ever see a $’\r’ in your bash -x output, that’s your damage.

 

Notes on Javascript cryptocurrency mining

The new hotness is web sites using visitors’ CPU to mine cryptocurrency. Typically without consent, often via malware, although sometimes as a disclosed alternative to ads. Fuck everything about that.

Still I was curious how this works. Mining most cryptocurrencies requires speciality hardware to be worth the bother. GPUs, at least, and in the case of Bitcoin ASICs. So I’ve been wondering how a Javascript miner would work, surely it’s 100-1000x slower than a native GPU program? Are they using WebGL?

The most popular solution for Javascript mining now is CoinHive. And they mine the Monero currency. Why?  Explicitly for performance reasons. The Monero hash (Cryptonight) is something designed for CPU computing and doesn’t really run better on GPUs. So it’s a reasonable thing to do in Javascript.

BTW I found this CoinHive demo useful for playing around with a Javascript miner in my own browser. Yup, it takes 100% of all 8 of my CPU threads very easily.

(Related: the NoCoin extension is a way to protect your browser from this crap. Ad blockers will typically block them too (uBlock Origin does), but I want to see the icon specifically about whether some website is running a miner.)

Python 3 benchmarks

Discussion on Reddit led me to this official Python benchmark site. If you fiddle the settings you can get a comparison of how various Python 3 distributions compare to Python 2 on a bunch of benchmarks.

Screenshot-2018-2-12 Python Speed Center Comparison.png

Bars below 1.0 mean Python 3 is faster than Python 2. The different colors are different point releases of Python 3. The broad picture here is that Python 3 is generally about as fast as Python 2 or a little better. The big slower in the middle is Python startup; that’s 2-3x slower now. No other obvious pattern to me.

I’d had in my head Python 3 was generally about 20% slower. Partly because it does Unicode properly now, partly because some of the switch from collections to iterables in the core tuple and list types added slowness. But that opinion is not born out by this data.

PS: this screenshot brought to you by Firefox’s awesome screenshot tool. Not sure if it’s new to everyone or just me, but it makes saving an image of a DOM chunk of a page very easy.

Why does CJK software have such ugly English text?

There’s a distinct style of typesetting in Japanese software, particularly videogames, where the English text looks terrible. Like they use the same two fonts (one serif, one sans) from 1982 and they’re typeset wrong. Even in new software, like the brand new Monster Hunter World game. Chinese and Korean software often has the same problem. Why does CJK software do such a bad job with English text?

Screenshot_1.png

I found some sources online and they describe several kinds of problems:

  1. Font availability. Your Japanese (or Chinese, or Korean) computer won’t have many fonts that support both your language and Roman characters. So you use the ones that are there. They look fine in your language so you don’t care much if they look awful in Roman. MS Mincho or SimSun for example. It’s a bit like how so much stuff is done in Arial or Microsoft’s Times New Roman. They aren’t great, but they are present.
  2. Typesetting ascenders and descenders. The way Roman characters have a middle weight and then go above that (say the letter d) or below that (p) is a distinctive aspect of American font design. CJK characters don’t do that, they have a totally different shape. Descenders in particular often get squeezed in Japanese fonts for Roman characters.
  3. Mismatched aesthetics. Roman fonts have Serif and Sans-Serif fonts. Japanese has Mincho and Gothic. But while Mincho fonts often make the Roman characters have serifs, there’s no real commonality in design there at all.
  4. Halfwidth Roman characters. Old computers used fixed width character displays. Typography pretty much always looks awful this way. But on top of it in a CJK writing system most characters use a full width cell but it’s two wide for Roman letters, so you squeeze in two half-width characters instead.

None of these issues prevent a Japanese or Chinese or Korean company from producing excellent English typesetting. But if you’re used to seeing badly typeset Roman characters all the time in your daily computer work, it won’t stand out at you so badly when someone is finally localizing your product to America or Europe and they start translating the menus in the fastest, cheapest way. At least that’s my theory.

Some further reading:

Dexie is good software

I’m really glad I chose to use Dexie as my interface for IndexedDb storage in browsers.

Mostly it’s just really professionally packaged. The docs are great. The basic API is very simple to use. Where things get complicated they make right choices. Like bulkAdd() will abort if you’re inside a transaction and try to add on top of some duplicate keys unless you explicitly override that. But outside of a transaction it’ll just do its best to add data that doesn’t conflict and log a warning.

It also has nice support for schema migration. I haven’t stressed this too hard, but adding new columns works nicely and transparently for users. It has simple support for writing custom migration functions, too.

Dexie supports some fairly complex interactions with the database. All I’ve had to do is simple things and I appreciate that simple things are simple. But it looks good for doing complicated things, too.

 

Deleting iCloud database records is very slow

Wanderings uses iCloud’s private databases as a backing store for data. I wanted to delete some data from them and it is terribly, terribly slow. Roughly 40ms / record.

We’re using this Javascript code to delete records:

CloudKit.getDefaultContainer().privateCloudDatabase.deleteRecords(data)

If the data is 100 records it takes 4000-5000ms to complete. 10 records takes about 400-500ms. So much for batch deletion.

Exacerbating this we had a bug where I was trying to delete 60,000 records at once. This hangs for a very long time and then returns with an error that the operation is too big. Also there was a second bug where we were executing several requests in parallel. This does seem to work but isn’t a good idea. Chrome crashes outright (Oh, Snap) if 26 requests are running in parallel. Firefox didn’t crash but other bad things happened.

So now I’m deleting 60,000 records, 100 at a time every 5 seconds. That’ll be most of an hour.