Some IndexedDB notes

I’m using IndexedDB for permanent storage for a webapp. It’s pretty nice; a NoSQL transactional database in your browser! And all built with asynchronous calls. The API is notoriously confusing so I’m using Dexie as my front end. It’s very friendly and I like the way it uses Promises so directly. It also has a smart story about schema versioning. (See also LocalForage.) Anyway, some notes:

How much disk your data uses is a mystery. I have a database with 1873 records that consist of an integer primary key, an integer timestamp, and two doubles. Let’s call it 64 bits each, so that’s 32 bytes a record. OTOH on disk it’s taking 464k, or 254 bytes a record. There’s no Javascript API for asking how much space you’re using or what your quota is.

Storage limits are also confusing. Firefox has no upper bound but will use up to 50% of your disk’s free space and then start evicting stuff on an LRU basis. Chrome also has no fixed limit but may be based on the user’s hard drive size. Older references say a strict 5MB limit but that seems to be much bigger now. For a desktop user it seems likely to be gigabytes.

Firefox implements IndexedDB as sqlite files, but the sqlite schema on disk isn’t anything like my application’s schema. There’s tables named “database” and “index_data” and “object_store”, so I’m guessing they’re using sqlite more as a structured file than the actual database.

 

gzthermal compression visualization

Neat little tool: gzthermal-web, which visualizes the efficiency of compression from gzip. Each byte is colored depending on how well it compresses; dark blue for 1 bit up to red for 9 bits. It’s a wrapper for the gzthermal tool, as discussed on Hacker News and in this sort of odd article. (Undocumented feature; append &w=1 for a wider image.)

Here’s my weblog visualized with it, sorry for the giant long image. It looks better if you view the image alone shrunk-to-fit.

Structure is clear. The bright yellow parts are inlined PNG images which won’t gzip very well since they’re already compressed. The dark blue chunk at the front is my calendar, a bunch of stupidly repeated HTML. Blog posts are the motley blue/green patches separated by the dark blue of the HTML boilerplate between blog entries.

I’m not exactly sure how I’d act on this information. But it looks neat!

gzthermal.now.sh

Javascript Promises wot I learned

I program in 2006-era Javascript. But recently I started doing some more modern stuff. Arrow functions, those are great! And easy. Promises! Those are great. And confusing. Here’s some notes about what I’ve learned about Promises.

Warning: I’m a n00b here, some of this info could be wrong.

  • The best single guide to Promises I’ve read is Google’s. The folksy tone at the start is annoying but keep reading. MDN also has a basic guide that’s useful. You’ll want the MDN reference too.
  • The key thing no one explains is that when you create a Promise, it is automatically added to the event loop and will be executed by the browser when it has time. You do not have to schedule a promise or create your own event loop, like you do in Python async programming. It’s all automagic.
  • The exciting thing about Promises compared to other async frameworks is the .then() function, the ability to chain a second promise onto a first one. “First do this thing asynchronously, then when it is completed start doing this second thing. And make this chunk of two things itself a Promise so it runs asynchronously”.
  • You can also combine Promises with all() and race() but I don’t see it being used that way very often.
  • The other neat thing about Promises is it has an API for error handling. I’ve never used this for real (I’m an optimist programmer), so I can’t say more about it, but it seems important.
  • A Promise is a Javascript function that runs like any other function, but will yield control and allow other things to run when it’s blocking on network IO. (At least I think that’s right, I’m not 100% confident.)
  • When I’m sloppy I think of a Promise as basically a Thread. An independent unit of execution that runs in parallel with everything else. That’s not really true; Promises aren’t Threads, in particular there’s no interrupting multitasking. There is co-operative multi-tasking though so as soon as your Promise blocks waiting for network or whatever, something else will happen. Specifically the browser’s event loop will keep running and the page will be responsive, the key reason to be using async programming at all.
  • It’s OK to create Promises that never invoke their resolve or reject functions. That means the Promise won’t ever have a return value to pass on to the next Promise in the then chain, but that’s OK. Maybe you wanted the side effects or something. It’s probably not a good idea though, the promise will stay in pending state forever and nothing chained after it with then() or whatever will ever run.
  • It’s also OK to mix Promises and events for different styles of asynchronous programming. Promise chains let you program in a more or less declarative way: do A then B then C. But it can be tricky to arrange your control flow for a whole page program to make sense this way. Events are nice because the publisher of the event doens’t need to know anything about who might be consuming it somewhere else. It’s fine to have a Promise post an event that something else listens to and acts on asynchronously.

This code is laughably simple, but I found it useful when learning about Promises:

 

var state = 1;

console.log("starting");

console.log(state);

p1 = new Promise((resolve, reject) => {
    console.log("p1 created");
    setTimeout(function() {
        console.log("p1 finished");
        state = 2;
        resolve();
    }, 1000);
});

console.log("made p1");
console.log(state);
p1.then(function() { console.log(state); });
console.log("made p1.then");

Ubiquiti EdgeMAX EdgeOS router notes

I’m setting up a new Ubiquiti router, an EdgeRouter X SFP using firmware v1.9.7+hotfix.4. Some notes on doing this:

Out of the box the router only works on ethernet port 0 (eth0). It does not run DHCP. You have to manually configure a computer to talk to it on 192.168.1.1.

The router will redirect you to HTTPS when you connect. However they of course don’t have an SSL certificate for 192.168.1.1 so your browser will refuse to connect. You can override this in Chrome and Firefox by looking for “Advanced” mode on the error screen.

The first thing to do with a new Ubiquiti router is upgrade the firmware. The stock firmware is old and missing important features. (My memory is upgrading UniFi access points is also necessary, the shipping version had a crippling bug.)

Once you flash it the first thing to do is run the “Basic Setup” Wizard. (Note, this is not available in the shipping firmware.) This will set the router up to do What You Expect from a consumer router; NAT routing for eth0 (the WAN port) and DHCP for the rest. For a simple home network that is probably all you need to do.

1.9.7 and other versions have a bug where if you manually configure some static IP addresses along with DHCP, then dhcpd won’t start when you reboot the router. The fix for this is to remove all static entries. I was able to add one static entry later via the DHCP control panel and rebooting still works. Not sure if I can manually add an entry or not. This is all probably a bug in vyatta, their GUI config system.

DNS is not enabled for DHCP entries or static hosts. That’s a thing the dnsmasq used by many routers does for you. I can’t figure out how hard this would be to enable. The obvious wizard reportedly only adds your entries to /etc/hosts on the router itself which is not propagated by its DNS server. I didn’t try very hard.

UPnP is not enabled by default. You can add it via a feature wizard; internal is “switch0” and external is “eth0” (or whatever your WAN port is).

For my own notes, a network setup idiosyncratic to me, I added a static interface route to 192.168.1.0/24 going out eth0. Hoping to reach my invisible wireless ethernet bridge boxes on 192.168.1.110 and .111. No luck, but I think that may be because they’re configured to think the network is 192.168.0.0/23 and so can’t talk to my 192.168.3.* addresses. Oops.

 

Strava’s new heatmap, some observations

Strava just published a new map of all their data. 3 trillion GPS points turned into a heatmap. It’s lovely.

Pasted image at 2017_11_02 08_32 AM.png

They wrote a fantastic writeup of how they made the map. All custom software. It’s not really a “heat map” in the sense I use the term, in that it lacks heat diffusion that leads to blobbiness of a true heat map. It’s more of a spatial histogram. That’s just a quibble, but it helps me keep this visualization straight from the new Mapbox JS GL heatmaps I’m playing with.

The smartest thing they do is adaptive normalization of the colormap to spatially local data. They describe it in detail with the CDFs they calculate. Basically bright yellow indicates the highest number for a local region near where you’re looking now, not an absolute number for the highest value in the whole world. This allows seldom-visited areas to still have compelling and readable visualizations. You can see this in effect if you zoom in to Alcatraz which has relatively few visits; the brightness changes radically from z=15 to z=16. A trick worth stealing.

I was also struck at how all the tracks look nicely snapped to roads. Mobile phone locations are never that accurate. And they deliberately fuzz all points by 2 meters, why are the roads so sharply defined? I think they simply have enough points that the average that comes through visually really is the actual road. Neat! You can see this in a hyper-zoom on the Golden Gate bridge where you can see faint traces of tracks off the bridge, but the main bridge path is highlighted. Note also the little bump, that’s where you have to walk about 2m to the outside to avoid the tower. (I believe the two tone-line is because this is walking data, pedestrians tend to walk the east side.)

Pasted image at 2017_11_02 08_33 AM.png

 

Google Cloud Postgres connections

I’m using Google Cloud’s Postgres implementation. I thought I had it all working except my AppEngine server couldn’t connect to it in production. For dev work I’d just authorized my home IP address and connected directly to it from my dev box. But that didn’t work in production.

Turns out there’s complex docs for connection to Google Cloud SQL. Here’s what I had to do when following those instructions. (This was made significantly more different by intermittent outages either at Google or in my network connection; lots of failures and timeouts.)

  1. Download cloud_sql_proxy to my dev box
  2. Get some application default credentials on my dev box
    gcloud auth application-default login
    This does some OAuth thing with my Google account. In retrospect I have no idea how I was able to gcloud deploy, etc before doing this.
  3. Authorize my home IP address to connect to the SQL instance via the cloud console.
  4. Run the gcloud_sql_proxy and have it listen on 127.0.0.1:6432 (port 5432 is busy with my own Postgres instance.)
    ./cloud_sql_proxy -instances=projectname:us-central1:dbhost=tcp:6432
    Note the instance connection name has three parts separated by colons. “gcloud sql instances list” does not show you names in this format, but the console does in the “Instance details” page for that specific SQL server.
  5. Test I can connect to the database via local proxy
    psql postgresql://postgres:password@127.0.0.1:6432/postgres
    The proxy will log a line about the connection.
  6. Add the beta_settings / cloud_sql_instance stuff to my app.yaml. I have no idea what this does or if it’s necessary, but the docs said I should.
  7. In my dev environment, configure my code to connect to its database via 127.0.0.1:6432, so it uses the proxy. It’s not clear the proxy is necessary. I can connect directly to the database machine via IP address. I guess it’s nice in that the proxy is configured just with the instance connection string, not an IP address.
  8. In prod configure my code to connect to
    postgresql://postgres:{password}@/{dbName}?host=/cloudsql/projectname:us-central1:dbhost
    I have no idea what this ?host= argument does or why it works. Is this a standard thing or did Google fiddle with its Postgres drivers somehow?

That’s more or less the process described in Google’s docs, but it took me an hour of trial and error to make it work.

Update: I asked Stack Overflow about how cloud SQL connections worked and got a very quick answer from a Google Cloud engineer named Vadim. “It’s a UNIX socket backed by Cloud SQL Proxy”.

In detail, /cloudsql/projectname:us-central1:dbhost is a Unix domain socket, not a TCP socket. There’s an actual file (or directory?) named /cloudsql/projectname:us-central1:dbhost in the Unix machine’s filesystem. The ability to connect to Unix domain sockets with a special host= parameter is implemented in libpq, the Postgres connection library. The Cloud SQL Proxy code talks about how it can proxy from a Unix domain socket.

It’s a nice optimization; it’s silly to use a TCP/IP stack for passing data between two processes on the same computer. I assume Linux has a highly optimized localhost driver, even so Unix domain sockets should be better. This discussion suggests 1.5-2x the throughput.