Putting lolslackbot on the back burner

I’ve decided to stop working on lolslackbot, my social project for League of Legends players who use Slack or Discord. I wrote it originally for a few friends, then slowly expanded it to a few hundred users. But I’ve never put the work in to make it a consumer product and now am not motivated to do it.

The main feature missing is any sort of web interface so people could sign up for themselves. I’ve been maintaining it by hand with database update scripts, doing ~30 minutes of one-off work every few weeks instead of one focussed month-long engineering project. This blog is full of bold plans to port the whole thing to Django and get going on a web interface, but I never did it. Too much product work I don’t really know how to do well, designing interactive web UI. Hell, I don’t even have a proper name for the project.

Also some deeper technical problems. The Django port seems doable but requires database schema changes, specifically in how many-to-many relations work. And I got part of my core schema wrong, an assumption that an individual only belongs to one group. Fixing that would require redoing pretty much all the tests and half the business logic. Also at some point I’d have to migrate from sqlite to Postgres and that doesn’t sound like fun at all. In retrospect it’s too bad I didn’t start with Postgres+Django, but that seemed complicated at the beginning when I was thinking of this as just a cron job.

My real reason for lack of enthusiasm is the market. I like games and I like the idea of making game playing more social. But League of Legends is a hard community to build humble tools for. Most of the energy there is to highly polished and well marketed sites like LolKing and I’m just not that ambitious. There’s not much money in it (Riot’s API requires you don’t charge for services) and not a lot of love either. Me and my gaming buddies are on a bit of a LoL break too, which makes it harder to stay personally motivated. I’m also bummed that Riot hasn’t done anything more with Clubs, their social feature, my hope was to springboard off of that to build out the bot.

I did get some data from the last user population of Learning Fives, a cohort of ~80 people playing games together for a few weeks. 50% said they found it useful, 20% said it wasn’t, and 30% didn’t know what it was (despite seeing it in their channel). Not sure what conclusion to draw from that.

Anyway it’s a weight off my mind to just say I’m not going to do further work on this, at least for now. Truthfully my mind is on political work right now, I’d really like to do some sort of progressive activism combining data processing and GIS. (I’m following Mike’s work on redistricting closely.) To the extent I do anything for games it’s about time to revisit Logs of Lag, which 2.5 years later is still running just fine and uniquely useful. But I have some bugs to fix and maybe some improvements to make.

 

Moving forward with lolslackbot and Django

I’m very encouraged with how my little Django experiment has worked out. I’m ready to start using it with lolslackbot. A big breakthrough was realizing that I can start using Django as soon as I trust it not to corrupt my database. Keep the existing Python cron job in old code, not using Django, and then have a separate webapp that also uses the same database. The cron job and the webapp will be reading the same tables but in general not writing the same tables. It seems like a good transition plan.

Only now I really have to face the sqlite question. I’m confident that ultimately I need to be using PostgreSQL. I’m envisioning thousands of web users updating the database; there’s no way that works with sqlite’s database-level locking. Even with low traffic the cron job and the webapp will be stepping on each other. I need to switch. But when?

Do I switch databases first, porting my cron job code over to a Postgres-backed system? That seems really painful and not much fun. Lots of backend work with no visible features.

Or do I use Django first, having it talk to the sqlite database? The lock contention won’t be a real problem if I’m the only webapp user. It lets me build fun / useful features sooner. And it gives me some more experience with data modelling. I’m pretty sure I’m going to want to refactor the schema along with the Postgres update, it’d be better to do that after doing some of the webapp work. The risk with Django-first is I do a lot of sqlite-specific work that is ultimately wasted. But Django’s ORM insulates you from the underlying database pretty well, so maybe that doesn’t matter?

Also curious how the Django ORM is going to work for me. I use a couple of non-standard SQL things now. Mostly “insert or replace”, which is sqlite’s upsert-like extension. Does the ORM expose those? Also very curious how testing will work. sqlite is so simple when managing test environments. But I know the Django folks have thought this through.

Django learnings: models

I’m soldiering on with trying to apply Django to my lolslackbot project, I thought I’d take a stab at letting Django try to use my existing database. My specific goal is to set up a read-only set of views on the primary social tables I have. People, Groups, Destinations all name entities in my system, I also have GroupMembership and Subscriptions tables which are many-to-many relations between the three primary tables. So far so good, at least for the primary tables; still working on the relations

Turns out starting with a legacy database isn’t too hard; you can use the inspectdb command to build skeletons of Django model classes from an existing database. Then hand-edit the resulting code. As a bonus this is an excuse for me to start learning more about Django models. Some random things I learned:

  • Django model classes define both a database schema and the UI validation behavior in HTML forms.
  • Every Django model class must have a primary key. The Python field is named “id”, not sure renaming it is possible or a good idea.
  • A model can have the option “null” set to true or false; this is whether empty values are stored as nulls in the database. (Why would you ever not?!) There’s also a “blank” option which has nothing to do with the database, but is whether the field is optional in autogenerated forms like the admin interface.
  • You have to hand-add each model class to admin.py to get it to show up in the admin interface.
  • Django coding style is lowercase_with_underscores for field names, CamelCase for class names. ¿Porque no los dos?

That’s the easy stuff. On to hard stuff.

I’m not sure how to translate my existing many-to-many relations tables to however Django implements relationships. I thought maybe the extra fields support (using the through keyword argument) might do it, but that seems like overkill and possibly awkward. I think I need to just adapt the ManyToManyField to my existing schema.

I’m a little scared to let Django start writing data to my database.

inspectdb sets up things with managed = False by default. That seems wise; it prevents Django’s schema management stuff from messing with tables someone else defined already. But some day I’m going to want Django to take all this over for me, can I later change managed to True and make sense out of it?

I haven’t even begun to think about testing in the Django world. I know there’s a lot of support for tests, maybe that’s my next learning.

lolslackbot postmortem

Had a significant outage for my lolslackbot project yesterday. A few different things went wrong and I’m still confused for what the problem is.

The behavior

The problem manifested as me seeing the same message being delivered every time the program runs, every 3 minutes. That’s bad; I’m spamming my users. At the same time I was seeing errors in my logs from trying to deliver messages via Slack. No useful message mind you, but at least a hint.

I was busy last night when I spotted the error so I just shut the whole system down until morning. Then in the morning I tried a quick fix and run the script but that went badly, so I had to look closer. I finally got it fixed after two hours of work.

The delivery bug

This morning first thing I did was add more logging and reproduce the problem. I discovered the error was one of the Slack channel IDs no longer existed, which caused an exception in the Slack messaging module, which then broke things. The underlying problem was a design flaw in my error handling; I was trying to deliver all Slack messages at once and only then updating the database indicating those messages had been processed. The result is if there were 3 messages to be delivered at once and the 2nd one caused an error, the 1st one would get delivered but not marked processed and so would get delivered again.

So I fixed it by refactoring the logic that marks messages processed. I still deliver all the Slack messages at once but now individually flag whether each one worked or not. I also mark a message processed whether there was an error in delivery or not. The underlying problem is basically a distributed transaction. I’d rather err on the side of occasionally losing a message than sending the same message many times.

Rate limiting problem / match commit semantics

A second problem making all this diagnosis difficult was that my system was downloading match objects but they weren’t ending up in the database. I finally figured out my script that downloads all missing matches was crashing before it finished. And I only was calling commit on the database when the script finished, so all the work was getting lost. Derp. I fixed it to now commit after every single match object is downloaded. Also put in some better error handling.

So what’s causing the errors downloading matches? I’m not really sure, but I think it’s Riot’s rate limiter. I have some very high rate limit that I shouldn’t be getting near, but I’m still getting 429 responses for my meagre stream of requests, being told to wait. And this problem has been going on for days. I had chalked it up to a networking problem with their servers, but it turns out it’s my client waiting politely like it’s been asked to. So why am I being throttled?

I don’t know. The thing that triggers it seems to be a few odd matches that are returning 404 errors indicating the match doesn’t exist. (Even though it should, since I saw a reference to it from another API call.) Perhaps they have extra rate limiting for clients that make repeated requests that generate 404s? Part of the problem here is that I treat a 404 as “no meaningful response, try again later”, so I’ve accumulated 10-15 of those over time. I should clear them out, and change the code to stop trying if it gets the 404.

Lessons learned

Man, debug logs are a big help. Fortunately the same time I was having this problem I’d just committed new code to write debug logs more usefully to a file. Couldn’t have figured out what was going on without it.

A broad thing I learned here is be smarter about error recovery logic when working with third party services. I think when interacting with Riot or Slack or whatever, I want to do one small bit of remote API work and then immediately commit that work to the database before trying the next remote thing. And handle errors from remote services robustly, continuing even if it fails.

Unfortunately some of my code is now squelching exceptions, logging them and continuing instead of crashing the program. This is necessary to make my code more robust to errors, but is scary. Anyway I found I was having a hard time logging exceptions properly, here’s the way I’ve settled on:

try:
    someFunction(data)
except Exception as e:
    logging.error('Something went wrong %s' % data, exc_info=True)

The key thing here is the “exc_info=True”; this gets Python to include a stack trace. Before I was trying to actually log the exception object e itself, but that only gets you the message, not the stack. My use of % is an anti-pattern, I’m really supposed to use a comma and let logging do the substitution, but for some reason I find that error prone. And the worst thing about errors in a logging function like this is unless you are superhuman you often don’t have test coverage for the exception cases, so this line of code only ever executes in production when something else already went wrong and it’s very confusing.

Stack traces from running Python programs

My lolslackbot program occasionally hangs forever and I’d like to know what failed when I kill it. Starting with Python 3.3 it’s relatively easy using the faulthandler built-in library. You have to set it up ahead of time (it’s not on by default), but once you do you can send the process a SIGABRT and it will display a slightly spartan stack trace and abort the program.

You can get a stack trace without killing the Python process by registering another signal, like faulthandler.register(signal.SIGUSR1). That signal is not entirely unobtrusive; it interrupts time.sleep() for instance. But the program does seem to keep running after printing the stack trace. All the signals registered by default seem to also kill your program; USR1 won’t. (Unless I’m confused.)

It’s all implemented in C to enable printing useful stack traces even if the Python VM itself is broken. Also you can enable it just by setting an environment variable, before any Python code is run, which I imagine is useful for debugging problems at startup.

There’s also an interesting faulthandler.dump_traceback_later() function which seems to basically be a watchdog. It sets a timeout with a separate thread that results in a stack trace dump and, optionally, your program exiting. It calls _exit() which is the hard exit, which has pluses and minuses.

It’s sure a lot easier than the only other way I knew to do this, which was to attach gdb and inspect the interpreter’s state. But I wish it were enabled by default like Java’s old built-in behavior on SIGQUIT (Ctrl-\). Maybe they were afraid it was too radical a backwards-incompatible change.

180 new lolslackbot users

I just set up lolslackbot for Learning Fives Session 2. Some 180 new users. Kind of an exciting day for me, before I only had 30 or so users.

My test infrastructure continues to pay off in spades. All sorts of firsts today. First time running with users a new region (Latin America). First time where I got a game where all 10 people were in my cohort I was tracking. Etc. And AFAICT it all worked as intended. Really no problems running it other than some scutwork and some unintended consequences.

The scut work is I still have no UI for administering users in my system. I spent half of today writing 150 lines of Python + tests to set up the new users. I have to create entries in five tables: People, Groups, Channels are all primary data entities, I also have “GroupMembership” and “Subscriptions” which are basically tables joining IDs from other tables. And all this crap is injected with command line tools. I do have scripts to at least populate them from CSV and JSON datafiles, it’s not raw SQL, but it’s close.

I really need a proper web GUI for editing these data entities, regrettably in Django, and I’m still dragging my feet on taking that leap. I have an idea for a new product I can do in Django though, a quick spike for something related to the existing code

The unintended consequences is I neglected to account for how unusual this first run would be after the import. Basically I was catching up, getting info on past games those 180 new users played. It took 15 minutes to download 10 match records for each of my 180 new users. And then that generated 177 messages which it immediately delivered, in a few cases spamming 10+ messages to a Slack channel. Those messages were all correct in some sense, but in retrospect it would have been better to suppress the past and only start delivering new messages instead. Oops.

2016 Python webapps

I want to build a webapp for my lolslackbot project. What do I use?

Django is the clear consensus choice for large, grownup projects. I used it once a couple years back and thought it was fairly good, if not exciting or lovely. But I have in the back of my head the Django ORM doesn’t play nice with homegrown schemas, that it really wants to control the schema itself. Also I have the impression that Django is kind of big and clunky, although maybe that’s unfair.

The cool kids all use Flask. I like the idea of a microframework, something simple. Flask doesn’t inspire a lot of confidence though. The last release is 0.10.1, released nearly three years ago. And while Python 3 is supported its use is discouraged. That may have made sense in 2013 but it’s a backward opinion in 2016 (IMHO). (The author reiterated that opinion as much as 8 months ago.) OTOH my friends who use Flask say not to worry about it, that it works fine with Python 3 and it’s simple.

Only no one just uses Flask. They combine it with Jinja2 templates, and WTForms, and SQLAlchemy for an ORM. Add in logins, sessions, and some CSS frameworks and you’re looking at a lot of software. Is it better to glue together your own custom assemblage of small components or should you just use a full framework like Django?

Full Stack Python has some useful advice.

My friend Brad G. recommended starting with cookiecutter-flask as a set of components + structure and customize from there. Man, it’s been years since I used a Visual Studio wizard! It looks like a good way to get up and running though.

Successfully installed Blinker-1.4 Flask-0.10.1 Flask-Assets-0.11 Flask-Bcrypt-0.7.1 Flask-Cache-0.13.1 Flask-DebugToolbar-0.10.0 Flask-Login-0.3.2 Flask-Migrate-1.8.0 Flask-SQLAlchemy-2.1 Flask-Script-2.0.5 Flask-WTF-0.12 Mako-1.0.4 SQLAlchemy-1.0.12 WTForms-2.1 WebOb-1.6.0 WebTest-2.0.20 Werkzeug-0.11.4 alembic-0.8.6 bcrypt-2.0.0 beautifulsoup4-4.4.1 cffi-1.5.2 cssmin-0.2.0 factory-boy-2.6.1 fake-factory-0.5.7 flake8-2.5.4 flake8-blind-except-0.1.0 flake8-debugger-1.4.0 flake8-docstrings-0.2.5 flake8-isort-1.2 flake8-quotes-0.2.4 gunicorn-19.4.5 isort-4.2.2 itsdangerous-0.24 jsmin-2.2.1 mccabe-0.4.0 pep257-0.7.0 pep8-1.7.0 pep8-naming-0.3.3 psycopg2-2.6.1 py-1.4.31 pycparser-2.14 pyflakes-1.0.0 pytest-2.9.0 python-editor-1.0 setuptools-20.2.2 testfixtures-4.9.1 waitress-0.8.10 webassets-0.11.1 wheel-0.29.0

Thinking for real about a webapp is making me once again consider whether SQLite is a good choice going forward. The lack of fine grained locks is going to make multiple writers unacceptable. I sure wish it had per-table locking, that would probably be good enough for now.