Apple UI geniuses at work

I accidentally clicked something Keychain related in my menu bar and this popped up

Screen Shot 2016-05-24 at 4.28.06 PM.png

Notice the lack of a red close button? Cancel causes the window to respawn middle of my screen. Over and over again. Did I mention the window floats on top of everything else?

I particularly like that apparently the only way to get this to go away is to type my password to it. That’s good training for users; “type your password even if you don’t know what it’s for because it’s the only way to make the dialog go away”.

Update oh god it’s reproducing. A second one just popped up

Screen Shot 2016-05-24 at 4.34.16 PM.png

Update 2: apparently I’d locked my login keychain by accident. That’s cool, rebooting unlocked it without me ever having to type my password.

Django learnings: models

I’m soldiering on with trying to apply Django to my lolslackbot project, I thought I’d take a stab at letting Django try to use my existing database. My specific goal is to set up a read-only set of views on the primary social tables I have. People, Groups, Destinations all name entities in my system, I also have GroupMembership and Subscriptions tables which are many-to-many relations between the three primary tables. So far so good, at least for the primary tables; still working on the relations

Turns out starting with a legacy database isn’t too hard; you can use the inspectdb command to build skeletons of Django model classes from an existing database. Then hand-edit the resulting code. As a bonus this is an excuse for me to start learning more about Django models. Some random things I learned:

  • Django model classes define both a database schema and the UI validation behavior in HTML forms.
  • Every Django model class must have a primary key. The Python field is named “id”, not sure renaming it is possible or a good idea.
  • A model can have the option “null” set to true or false; this is whether empty values are stored as nulls in the database. (Why would you ever not?!) There’s also a “blank” option which has nothing to do with the database, but is whether the field is optional in autogenerated forms like the admin interface.
  • You have to hand-add each model class to admin.py to get it to show up in the admin interface.
  • Django coding style is lowercase_with_underscores for field names, CamelCase for class names. ¿Porque no los dos?

That’s the easy stuff. On to hard stuff.

I’m not sure how to translate my existing many-to-many relations tables to however Django implements relationships. I thought maybe the extra fields support (using the through keyword argument) might do it, but that seems like overkill and possibly awkward. I think I need to just adapt the ManyToManyField to my existing schema.

I’m a little scared to let Django start writing data to my database.

inspectdb sets up things with managed = False by default. That seems wise; it prevents Django’s schema management stuff from messing with tables someone else defined already. But some day I’m going to want Django to take all this over for me, can I later change managed to True and make sense out of it?

I haven’t even begun to think about testing in the Django world. I know there’s a lot of support for tests, maybe that’s my next learning.

lolslackbot postmortem

Had a significant outage for my lolslackbot project yesterday. A few different things went wrong and I’m still confused for what the problem is.

The behavior

The problem manifested as me seeing the same message being delivered every time the program runs, every 3 minutes. That’s bad; I’m spamming my users. At the same time I was seeing errors in my logs from trying to deliver messages via Slack. No useful message mind you, but at least a hint.

I was busy last night when I spotted the error so I just shut the whole system down until morning. Then in the morning I tried a quick fix and run the script but that went badly, so I had to look closer. I finally got it fixed after two hours of work.

The delivery bug

This morning first thing I did was add more logging and reproduce the problem. I discovered the error was one of the Slack channel IDs no longer existed, which caused an exception in the Slack messaging module, which then broke things. The underlying problem was a design flaw in my error handling; I was trying to deliver all Slack messages at once and only then updating the database indicating those messages had been processed. The result is if there were 3 messages to be delivered at once and the 2nd one caused an error, the 1st one would get delivered but not marked processed and so would get delivered again.

So I fixed it by refactoring the logic that marks messages processed. I still deliver all the Slack messages at once but now individually flag whether each one worked or not. I also mark a message processed whether there was an error in delivery or not. The underlying problem is basically a distributed transaction. I’d rather err on the side of occasionally losing a message than sending the same message many times.

Rate limiting problem / match commit semantics

A second problem making all this diagnosis difficult was that my system was downloading match objects but they weren’t ending up in the database. I finally figured out my script that downloads all missing matches was crashing before it finished. And I only was calling commit on the database when the script finished, so all the work was getting lost. Derp. I fixed it to now commit after every single match object is downloaded. Also put in some better error handling.

So what’s causing the errors downloading matches? I’m not really sure, but I think it’s Riot’s rate limiter. I have some very high rate limit that I shouldn’t be getting near, but I’m still getting 429 responses for my meagre stream of requests, being told to wait. And this problem has been going on for days. I had chalked it up to a networking problem with their servers, but it turns out it’s my client waiting politely like it’s been asked to. So why am I being throttled?

I don’t know. The thing that triggers it seems to be a few odd matches that are returning 404 errors indicating the match doesn’t exist. (Even though it should, since I saw a reference to it from another API call.) Perhaps they have extra rate limiting for clients that make repeated requests that generate 404s? Part of the problem here is that I treat a 404 as “no meaningful response, try again later”, so I’ve accumulated 10-15 of those over time. I should clear them out, and change the code to stop trying if it gets the 404.

Lessons learned

Man, debug logs are a big help. Fortunately the same time I was having this problem I’d just committed new code to write debug logs more usefully to a file. Couldn’t have figured out what was going on without it.

A broad thing I learned here is be smarter about error recovery logic when working with third party services. I think when interacting with Riot or Slack or whatever, I want to do one small bit of remote API work and then immediately commit that work to the database before trying the next remote thing. And handle errors from remote services robustly, continuing even if it fails.

Unfortunately some of my code is now squelching exceptions, logging them and continuing instead of crashing the program. This is necessary to make my code more robust to errors, but is scary. Anyway I found I was having a hard time logging exceptions properly, here’s the way I’ve settled on:

try:
    someFunction(data)
except Exception as e:
    logging.error('Something went wrong %s' % data, exc_info=True)

The key thing here is the “exc_info=True”; this gets Python to include a stack trace. Before I was trying to actually log the exception object e itself, but that only gets you the message, not the stack. My use of % is an anti-pattern, I’m really supposed to use a comma and let logging do the substitution, but for some reason I find that error prone. And the worst thing about errors in a logging function like this is unless you are superhuman you often don’t have test coverage for the exception cases, so this line of code only ever executes in production when something else already went wrong and it’s very confusing.

Django part 1 what I learned

I sat down and wrote my first Django app (in a very long time) yesterday using this tutorial, parts 1-3. Here’s some stuff I thought and learned:

  • Django has amazingly good documentation. That tutorial is fantastically written, clear and concise and elaborating where necessary. And an actively interesting toy app, it’s not just a dumb toy.
  • The polish is very good. Everything worked just as described with no weirdness. I particularly like that sqlite was the default database, that’s the right place to start for learning.
  • The distinction between a “project” and an “application”seems like needless complexity. The theory is that a single Django project instance might contain a mix of several independent applications. I wonder if that’s done in practice? I suppose so right away, the default project has the admin, auth, contenttypes, sessions, messages, and staticfiles projects. I wonder if many Django projects contain multiple third party / homegrown applications?
  • If you use the wizard to make an application and a project you end up with 12 files and 185 lines of text. I’m not a fan of codegen, it usually betrays an unpleasant complexity, but it’s not too bad in this case. The only big file is settings.py and that’s basically a declarative setting of what’s included in the project. That’s gotta go somewhere.
  • The good thing about Django is it comes with its own ORM and HTML template system. The bad thing is end up using their ORM and HTML templates. I don’t have enough experience with them to have an opinion other than some initial crankypants reactions.
  • The data model / migration / database agnostic stuff is pretty slick. I wonder if it works in practice.
  • I hate choosing between CamelCase and underscore_separation and forcedlowercase. The latter is particularly alarming; my database table names seem to be coerced downcase. Perhaps that’s a compatibility thing?
  • The auto-reloading of code is nice.
  • It appears that Django has no favored CSS UI framework. I imagine it’s easy to drop in Bootstrap or whatever, but part of me was hoping for some default styles that would make it easy to make things that look OK.

I ended up having 7 files open in my editor

  • project/settings.py to register my app
  • project/urls.py to register my app’s urls,
  • app/admin.py to register my models
  • app/models.py to define my data model
  • app/urls.py to define my webapp URLs
  • app/views.py to define my webapp views in response to URLs
  • app/templates/app/index.html as an HTML template for my view

Those first three files feel like make-work but the other four are actual real code I care about. That’s pretty good!

Overall I’m pretty pleased so far. I already know the pitfalls of going down this road, the way an ORM can betray you on performance in heavy traffic situations. But that’s not a road I’m worried about right now, and if I do ever get there I’m guessing there are ways to build special hacks bridges around the ORM overhead.

 

MacOS third party graphics drivers

I did something I never thought I’d do and installed third party graphics drivers for my Mac, which has an NVIDIA GeForce GTX 660M graphics card. I was motivated to do that because the new game Stellaris was crashing frequently. The cargo cult advice for another game from the same publisher was that the third party drivers seem to have helped and I’ll be damned, they do.

NVIDIA couldn’t make it more confusing what these drivers are for, saying “This driver update is for Mac Pro 5,1 (2010), Mac Pro 4,1 (2009) and Mac Pro 3,1 (2008) users.” I don’t have some ancient Mac Pro with a video card, I have a modern iMac with a stock apple GPU. But the fine print indicates that really these drivers are for pretty much any Mac with an NVIDIA GPU. Also it’s called the “Web Driver”, which is an insane name for a kernel module. Note that there’s no index for all NVIDIA releases; you have to go this reseller page to find all the versions. And don’t be confused by the CUDA drivers; those are for high end number crunching, not graphics and gaming.

The installer seems well behaved. Worked fine on reboot for me, and the drivers have only seemed to be an improvement so far. Also it includes a control panel where you can switch back to the default MacOS drivers or uninstall the NVIDIA stuff entirely.

No idea what’s going on with Apple and MacOS that the stock El Capitan drivers aren’t good enough. Or with Paradox, the game publisher, that their games are sold for Macs without testing on standard machine configs. The whole point of paying the Apple premium is this stuff is supposed to work out of the box.

 

sqlite: Unix epoch times

I store all my timestamps in the Unix epoch, as Thompson and Ritchie decreed. sqlite doesn’t really have a date type, instead it has date functions that can work with text or numbers. Unfortunately the numbers are defined with respect to two different epochs.

sqlite’s preferred date format is “Julian day”. This is a floating point number, the number of days since November 24, 4714 B.C., with hours/minutes/seconds being reported in fractional. Ie right now it’s 2457524.189894016. A single second is roughly .00001 in this time scale. sqlite3 reals are 64 bit, so we get 2^52 days or about 140M years of dates where individual seconds have distinguishable timestamps. No practical maximum :-P

Unix and Javascript’s preferred date format is an integer, seconds since Jan 1 1970. Right now it’s 1463330612. Unix times are generally 32 bits and therefore limited to the range of 1970–2038, although we’re all hoping we’ll see an extension to 64 bits before the apocalypse. sqlite3 already seems to have extended support and notes it will work on Unix dates from 0000 to 5352. The sqlite functions also seem to be OK with real number Unix epoch timestamps, with the fraction naming subseconds.

The thing that’s confusing is the sqlite functions like date and strftime operate on Julian day numbers by default. If you want calculations in the Unix epoch instead, you have to pass the flag ‘unixepoch’ to the function.

Some examples:

-- Julian days
select julianday('now');
2457524.19912943
select strftime('%Y-%m-%d %H:%M:%f  Julian:%J  Unix:%s', 2457524.19912943);
2016-05-15 16:46:44.783  Julian:2457524.199129433  Unix:1463330804

-- Unix epoch
select strftime('%Y-%m-%d %H:%M:%f  Julian:%J  Unix:%s', 1463330612, 'unixepoch');
2016-05-15 16:43:32.000  Julian:2457524.196898148  Unix:1463330612
select strftime('%Y-%m-%d %H:%M:%f  Julian:%J  Unix:%s', 1463330612.34, 'unixepoch');
2016-05-15 16:43:32.340  Julian:2457524.196902083  Unix:1463330612
select strftime('%Y-%m-%d %H:%M:%f  Julian:%J  Unix:%s', 0, 'unixepoch');
1970-01-01 00:00:00.000  Julian:2440587.5  Unix:0

-- Max sqlite Unix epoch
select strftime('%Y-%m-%d %H:%M:%f  Julian:%J  Unix:%s', 106750000000, 'unixepoch');
5352-10-09 09:46:40.000  Julian:3676119.907407407  Unix:106750000000
select strftime('%Y-%m-%d %H:%M:%f  Julian:%J  Unix:%s', 106760000000, 'unixepoch');
-141-03-01 13:07:12.700  Julian:1205032.046674768  Unix:-106751991168

-- Gotcha with 'now' and Unix epoch
select strftime('%Y-%m-%d %H:%M:%f  Julian:%J  Unix:%s', 'now', 'unixepoch');
1970-01-29 10:38:44.197  Julian:2440615.943567095  Unix:2457524

That last example is confusing; the time stamp ‘now’ is always the Julian day, even in the context of Unix epoch conversions. So it makes no sense to mix ‘now’ and ‘unixepoch’. I’m sure that’s the source of surprising bugs.

I’m not clear why the year 5352, integer 106750000000 is about the limit of sqlite’s Unix date implementation. That number is about 2^36.635, so it’s not some obvious integer rounding problem. Probably doing calendar arithmetic in floating point somewhere?