Javascript Promises wot I learned

I program in 2006-era Javascript. But recently I started doing some more modern stuff. Arrow functions, those are great! And easy. Promises! Those are great. And confusing. Here’s some notes about what I’ve learned about Promises.

Warning: I’m a n00b here, some of this info could be wrong.

  • The best single guide to Promises I’ve read is Google’s. The folksy tone at the start is annoying but keep reading. MDN also has a basic guide that’s useful. You’ll want the MDN reference too.
  • The key thing no one explains is that when you create a Promise, it is automatically added to the event loop and will be executed by the browser when it has time. You do not have to schedule a promise or create your own event loop, like you do in Python async programming. It’s all automagic.
  • The exciting thing about Promises compared to other async frameworks is the .then() function, the ability to chain a second promise onto a first one. “First do this thing asynchronously, then when it is completed start doing this second thing. And make this chunk of two things itself a Promise so it runs asynchronously”.
  • You can also combine Promises with all() and race() but I don’t see it being used that way very often.
  • The other neat thing about Promises is it has an API for error handling. I’ve never used this for real (I’m an optimist programmer), so I can’t say more about it, but it seems important.
  • A Promise is a Javascript function that runs like any other function, but will yield control and allow other things to run when it’s blocking on network IO. (At least I think that’s right, I’m not 100% confident.)
  • When I’m sloppy I think of a Promise as basically a Thread. An independent unit of execution that runs in parallel with everything else. That’s not really true; Promises aren’t Threads, in particular there’s no interrupting multitasking. There is co-operative multi-tasking though so as soon as your Promise blocks waiting for network or whatever, something else will happen. Specifically the browser’s event loop will keep running and the page will be responsive, the key reason to be using async programming at all.
  • It’s OK to create Promises that never invoke their resolve or reject functions. That means the Promise won’t ever have a return value to pass on to the next Promise in the then chain, but that’s OK. Maybe you wanted the side effects or something.
  • It’s also OK to mix Promises and events for different styles of asynchronous programming. Promise chains let you program in a more or less declarative way: do A then B then C. But it can be tricky to arrange your control flow for a whole page program to make sense this way. Events are nice because the publisher of the event doens’t need to know anything about who might be consuming it somewhere else. It’s fine to have a Promise post an event that something else listens to and acts on asynchronously.

This code is laughably simple, but I found it useful when learning about Promises:

 

var state = 1;

console.log("starting");

console.log(state);

p1 = new Promise((resolve, reject) => {
    console.log("p1 created");
    setTimeout(function() {
        console.log("p1 finished");
        state = 2;
        resolve();
    }, 1000);
});

console.log("made p1");
console.log(state);
p1.then(function() { console.log(state); });
console.log("made p1.then");

Ubiquiti EdgeMAX EdgeOS router notes

I’m setting up a new Ubiquiti router, an EdgeRouter X SFP using firmware v1.9.7+hotfix.4. Some notes on doing this:

Out of the box the router only works on ethernet port 0 (eth0). It does not run DHCP. You have to manually configure a computer to talk to it on 192.168.1.1.

The router will redirect you to HTTPS when you connect. However they of course don’t have an SSL certificate for 192.168.1.1 so your browser will refuse to connect. You can override this in Chrome and Firefox by looking for “Advanced” mode on the error screen.

The first thing to do with a new Ubiquiti router is upgrade the firmware. The stock firmware is old and missing important features. (My memory is upgrading UniFi access points is also necessary, the shipping version had a crippling bug.)

Once you flash it the first thing to do is run the “Basic Setup” Wizard. (Note, this is not available in the shipping firmware.) This will set the router up to do What You Expect from a consumer router; NAT routing for eth0 (the WAN port) and DHCP for the rest. For a simple home network that is probably all you need to do.

1.9.7 and other versions have a bug where if you manually configure some static IP addresses along with DHCP, then dhcpd won’t start when you reboot the router. The fix for this is to remove all static entries. I was able to add one static entry later via the DHCP control panel and rebooting still works. Not sure if I can manually add an entry or not. This is all probably a bug in vyatta, their GUI config system.

DNS is not enabled for DHCP entries or static hosts. That’s a thing the dnsmasq used by many routers does for you. I can’t figure out how hard this would be to enable. The obvious wizard reportedly only adds your entries to /etc/hosts on the router itself which is not propagated by its DNS server. I didn’t try very hard.

UPnP is not enabled by default. You can add it via a feature wizard; internal is “switch0” and external is “eth0” (or whatever your WAN port is).

For my own notes, a network setup idiosyncratic to me, I added a static interface route to 192.168.1.0/24 going out eth0. Hoping to reach my invisible wireless ethernet bridge boxes on 192.168.1.110 and .111. No luck, but I think that may be because they’re configured to think the network is 192.168.0.0/23 and so can’t talk to my 192.168.3.* addresses. Oops.

 

Strava’s new heatmap, some observations

Strava just published a new map of all their data. 3 trillion GPS points turned into a heatmap. It’s lovely.

Pasted image at 2017_11_02 08_32 AM.png

They wrote a fantastic writeup of how they made the map. All custom software. It’s not really a “heat map” in the sense I use the term, in that it lacks heat diffusion that leads to blobbiness of a true heat map. It’s more of a spatial histogram. That’s just a quibble, but it helps me keep this visualization straight from the new Mapbox JS GL heatmaps I’m playing with.

The smartest thing they do is adaptive normalization of the colormap to spatially local data. They describe it in detail with the CDFs they calculate. Basically bright yellow indicates the highest number for a local region near where you’re looking now, not an absolute number for the highest value in the whole world. This allows seldom-visited areas to still have compelling and readable visualizations. You can see this in effect if you zoom in to Alcatraz which has relatively few visits; the brightness changes radically from z=15 to z=16. A trick worth stealing.

I was also struck at how all the tracks look nicely snapped to roads. Mobile phone locations are never that accurate. And they deliberately fuzz all points by 2 meters, why are the roads so sharply defined? I think they simply have enough points that the average that comes through visually really is the actual road. Neat! You can see this in a hyper-zoom on the Golden Gate bridge where you can see faint traces of tracks off the bridge, but the main bridge path is highlighted. Note also the little bump, that’s where you have to walk about 2m to the outside to avoid the tower. (I believe the two tone-line is because this is walking data, pedestrians tend to walk the east side.)

Pasted image at 2017_11_02 08_33 AM.png

 

Google Cloud Postgres connections

I’m using Google Cloud’s Postgres implementation. I thought I had it all working except my AppEngine server couldn’t connect to it in production. For dev work I’d just authorized my home IP address and connected directly to it from my dev box. But that didn’t work in production.

Turns out there’s complex docs for connection to Google Cloud SQL. Here’s what I had to do when following those instructions. (This was made significantly more different by intermittent outages either at Google or in my network connection; lots of failures and timeouts.)

  1. Download cloud_sql_proxy to my dev box
  2. Get some application default credentials on my dev box
    gcloud auth application-default login
    This does some OAuth thing with my Google account. In retrospect I have no idea how I was able to gcloud deploy, etc before doing this.
  3. Authorize my home IP address to connect to the SQL instance via the cloud console.
  4. Run the gcloud_sql_proxy and have it listen on 127.0.0.1:6432 (port 5432 is busy with my own Postgres instance.)
    ./cloud_sql_proxy -instances=projectname:us-central1:dbhost=tcp:6432
    Note the instance connection name has three parts separated by colons. “gcloud sql instances list” does not show you names in this format, but the console does in the “Instance details” page for that specific SQL server.
  5. Test I can connect to the database via local proxy
    psql postgresql://postgres:password@127.0.0.1:6432/postgres
    The proxy will log a line about the connection.
  6. Add the beta_settings / cloud_sql_instance stuff to my app.yaml. I have no idea what this does or if it’s necessary, but the docs said I should.
  7. In my dev environment, configure my code to connect to its database via 127.0.0.1:6432, so it uses the proxy. It’s not clear the proxy is necessary. I can connect directly to the database machine via IP address. I guess it’s nice in that the proxy is configured just with the instance connection string, not an IP address.
  8. In prod configure my code to connect to
    postgresql://postgres:{password}@/{dbName}?host=/cloudsql/projectname:us-central1:dbhost
    I have no idea what this ?host= argument does or why it works. Is this a standard thing or did Google fiddle with its Postgres drivers somehow?

That’s more or less the process described in Google’s docs, but it took me an hour of trial and error to make it work.

Update: I asked Stack Overflow about how cloud SQL connections worked and got a very quick answer from a Google Cloud engineer named Vadim. “It’s a UNIX socket backed by Cloud SQL Proxy”.

In detail, /cloudsql/projectname:us-central1:dbhost is a Unix domain socket, not a TCP socket. There’s an actual file (or directory?) named /cloudsql/projectname:us-central1:dbhost in the Unix machine’s filesystem. The ability to connect to Unix domain sockets with a special host= parameter is implemented in libpq, the Postgres connection library. The Cloud SQL Proxy code talks about how it can proxy from a Unix domain socket.

It’s a nice optimization; it’s silly to use a TCP/IP stack for passing data between two processes on the same computer. I assume Linux has a highly optimized localhost driver, even so Unix domain sockets should be better. This discussion suggests 1.5-2x the throughput.

simpler database access in Python

I continue to look for things that make working with databases simpler in Python. I keep doing projects where I end up dropping down to raw SQL, with awkward syntax for querying rows and bespoke one-off code for updating data in the database. The Python is messy but I like being able to use SQL in its full glory.

I’ve tried using SQLAlchemy a couple of times and have a hard time loving it. ORMs are a bad idea in general and SQLAlchemy is very complicated and magic. For a schema with 20+ tables and a lot of complex interactions it makes sense, but for a small project where you just want something more sane than low level DBAPI…

For my current project I’m trying to use plain psycopg2 and its NamedTupleCursor. It’s not awesome, but at least it gives you named access to the rows from the DB. So it makes reading from the database a little nicer, but does nothing for writing. It’s Postgres only.

Along these lines is the records library. It also gives sane access to results from queries; either as dicts or named tuples. Result sets also have nice export functions like “write to CSV” or “make into a Pandas DataFrame”. It doesn’t really do much for writing data, just querying, so it’s simple and small.

dataset is an effort to do even more. I love their tag line: “Because managing databases in Python should be as simple as reading and writing JSON files.” Amen! There’s a lot more going on in dataset than records. It’s more like an ORM-lite and contains support for creating schema, automatic upserts, complex queries without Python, … It doesn’t have the export functions records has, but it does have a lot of thought for simple queries and updates. I’d have to actually use it for something to see if it’s sensible or if ORM-lite is an even worse idea than a full ORM.

Both records and dataset depend on SQLAlchemy, that turned me off at first. But they’re both mostly using it as a way to support multiple database types. In theory the Python DBAPI standard should be enough, but in practice it sucks enough I understand why they’d build on SQLAlchemy instead. With records you can mostly ignore that SQLAlchemy is there, but dataset exposes it more in the query system.

I wonder if some other programming environment has a much nicer database interface. The Microsoft world has LINQ, but that seems to have all the problems of other ORMs. I wonder if the R folks have a really nice way to work with databases, or maybe even Pandas.

I really like Python 3.6 format strings

Now that I have access to Python 3.6 I’m really enjoying the new PEP 498 format strings. Python has tried a bunch of different ways to format strings, from good ol’ %s notation to Template strings to str.format. They all seemed awkward to me because the thing you were substituting in was way far away from the string itself. Now you just embed a little python code right in the string to be formatted, maybe with a bit of presentation like how many digits to display. Done!

It’s pretty powerful under the hood. check out this example:

>>> today = datetime(year=2017, month=1, day=27)
>>> f"{today:%b %d, %Y}"  # using date format specifier
'January 27, 2017'

The format string is using the datetime-specific codes like %Y. Behind the scenes it’s calling the format() built-in which in turn can defer to the __format__() function on the object. In practice I suspect you can mostly not worry about this too much other than using basic floating point precision stuff.

I used to do this kind of thing a lot:

print ("{date} {asin} {title}".format(**locals()))

That’s gross for several reasons. Now I can just do

print(f"{date} {asin} {title}")

and be done with it.

GeoAlchemy and SQLAlchemy thoughts

I’m doing a new geo-project, a GPS tracker sort of thing. My database is very simple, just a Users table and a Locations table holding points. But I figured I’d front-load the work and use SQLAlchemy and Alembic and PostGIS and everything right now, not pay the pain of migrating later when I realize I need the fancier stuff.

Only SQLAlchemy doesn’t support PostGIS. GeoAlchemy2 does though. After an hour with it I can’t decide if it’s good or bad. It seems very simple.

In GeoAlchemy2 all geodata is represented in Python by just a few types; Geometry, Geography, Raster. You can’t do much with those things other than get the WKB or WKT for them. They’re not Points or Polygons or something, there’s no Python code for working with the shapes. You can pass them back to PostGIS though, to call functions like ST_Area to have PostGIS do a calculation. Requires a database round trip though. Alternately you can also convert them to Shapely shapes, use that excellent GIS library to do work in Python.

I guess that’s a reasonable compromise; GeoAlchemy2 is really just bridging to SQL, they’re not going to implement their own GEOS equivalent. It also has spatial functions declared in a SQLAlchemy-friendly way you so you can use SQLAlchemy syntax to do your work. That’s kinda neat.

Still I’m looking at a lot of third party Python code: SQLAlchemy, Shapely, GeoAlchemy, Alembic. All to manage a database with two tables and nothing more complex than a list of points. I may blow it all off and go to raw SQL. OTOH I may end up regretting that choice in a few months if I want to do more complex things.