Python Pyramid framework

Interesting conversation on Reddit about scalable webapp frameworks. Lots of endorsements for Pyramids, the successor to the old Pylons project. After a brief review it looks like it’s similar to Flask. The Reddit discussion says it scales better, that it’s thread safe and there’s a hope of running it async.

I like what Pyramids says about philosophy:

Megaframeworks make decisions for you. But if you don’t fit their viewpoint, you end up fighting their decisions. Microframeworks force no decisions, making it easy to start. But as your application grows, you’re on your own.

In both cases, the focus is on the start: either too much or too little. Either way, finishing and staying finished is hard. You need a finishing-focused framework with an architectural design that scales down to getting started, then up as your application grows.

It seems to mostly be focussed on URL dispatch, with a structure for web views and sessions. And support for logging and testing! Also has some connectivity to various template libraries in Python. No database code at all, but there’s extensive docs on using SQLAlchemy. Also the note “Pyramid has a strong orientation towards support for transactions”.

small Wikipedia text for inlining

Reddit Enhancement Sweet has a neat feature where if someone links to a Wikipedia article you can click on an expando button and a div pops up showing just a small bit of the Wikipedia article right on top of the page, a sort of quick preview. Here’s the code for it.

It works by using the Wikipedia API, specifically the “parse” method. (So named because it parses Wikimedia source data and returns JSON, the article contents). This URL is a sample query for the JSON contents of the Round-robin DNS page. There’s a zillion options for configuring what it returns, section=0 seems like a useful one to only get the first section of an article, typically the summary. And here’s how to get only the first paragraph.

The Wikipedia API has a sandbox for designing queries.

BTW I’m having a hard time with Chrome Developer Tools lately. I can’t find that Reddit page + RES loading this JSON anywhere. Maybe because it’s happening in a Chrome extension, out of site in a background page or something? Dunno.

Weird cable modem routing

Here’s a strange mtr report:

                                       Packets               Pings
 Host                                Loss%   Snt   Last   Avg  Best  Wrst StDev
 1.          0.0%   234    0.3   0.3   0.2   0.6   0.0
 2.                       0.4%   234   21.9  15.9   9.6  42.3   4.1
 3.                       0.9%   234   24.4  16.5   9.6  71.3   5.6
 4.                   0.0%   234   14.7  16.1  10.2  33.0   4.1
 5.  0.0%   234   23.3  17.2   9.8  42.5   4.6
 6.  0.0%   234   23.5  17.7  10.7  57.0   5.7
 7.                     0.0%   234   13.9  17.6  10.5  37.9   4.4
 8.                   0.0%   234   17.1  17.6   7.8  44.2   4.3
 9.                     0.0%   234   15.0  17.2  11.4  36.6   4.5
10.    0.0%   233   12.5  17.3  11.1  47.6   4.8

It’s hard to read that at first, but what’s going on is the path is changing at hop 2. Sometimes my packets go directly to, sometimes they go through first. And the packets take 4 or 5 hops before they get out of the local ISP and in to, suggesting sometimes they visit those other two addresses more than once. (I wish traceroute would break this out for me, show which fraction of packets take which route.)

That’s a capture from home in San Francisco, my ISP is Wave Broadband (aka Astound). They provide good clean 100Mbps without any of Comcast’s bullshit and have mostly been reliable. I had some problems yesterday (massive packet loss, lag) which got me looking. The problems went away when I rebooted my cable modem (which itself is WTF), but this weird routing remains.

I can’t tell if that reported 0.9% packet loss is real, or an artifact of mtr’s TTL trickery or reporting. A straight up ping to showed 0 loss in 1500 packets, so it may not be real.

I have some vague theory that what’s going on is there’s local congestion and if stuff gets buffered, becomes visible. But I have very little evidence for that.

League of Legends rank distribution over time

Just finished a small longitudinal data project. Not real happy with how it turned out, but figured I should document it anyway. Frustrated it doesn’t look better, also that it didn’t get more attention.

Here’s the deliverable: a Reddit post. It’s a visualization of rank inflation in League of Legends over the course of the last year, Season 6. I collected the data by hand out of curiosity. The resulting visualization more or less confirms what people who pay attention already knew; there is inflation. Only 15 upvotes which is crappy engagement for something I spent several hours on.


I tinkered with the visualization for awhile. The colors are meaningful, they represent Bronze / Silver / Gold / Platinum to folks who play the game. (There’s also White for Diamond+). There’s 5 divisions within each tier, that’s the black lines. And then some ugly caption text because I wanted to condense the whole thing down to a single shareable image.

I produced it using iPython, Pandas, and matplotlib. I really like Pandas’ abstraction of a DataFrame, it’s a very powerful tool for working for a matrix of numbers. Even so I have a lot of data massaging code to get from my original spreadsheet to a clean list of numbers. Deleting extra rows, transposing rows/columns, converting “93%” to 0.93, etc etc.

I really don’t like matplotlib. I’ll paste my graph generation code below, it’s super ugly. There’s like seven ways to set options. Flags passed to the Pandas plot() function, functions called on the global matplotlib object, functions called on the global pyplot object, functions called on the axis objects, two different ways to modify RC parameters, etc etc. Just a mess. Maybe this is my ignorance though and there’s a way to simplify / rationalize it all.

# Plot'seaborn-whitegrid')
matplotlib.rc('axes', grid=True)
matplotlib.rc('grid', color='w')
matplotlib.rcParams[''] = ['Liberation Sans']

# Filled area for the tiers
ax = tdf.plot(
    kind='area', stacked=False,
    figsize=(10, 6),
    color=('#ffffff', '#87fffd', '#ffdb57', '#dce2f2', '#b2a07e'))

# Lines for every division
df.plot(ax = ax, legend=False, color='k', linewidth=0.5)

# Configure the chart a bit more
plt.title('Season 6 distribution of ranks in NA')
plt.text(0, -35, '''There is a general trend of rank inflation...''',
        linespacing=1.4, size=12)

Setting up a new Jupyter data exploration notebook

Been a month since I had to set up Jupyter for a new project. Here’s all the steps, which seems awfully complicated for a “standard setup”. Some of that is idiosyncratic to me though.

  1. newvenv
    a shell alias I have. What it does:
    python3 -m venv venv;
    source venv/bin/activate;
    pip install -U setuptools pip
  2. pip3 install jupyter
  3. pip3 install matplotlib numpy pandas seaborn requests
    these are standard visualization libraries, plus a sane HTTP client. Fortunately these install from precompiled wheel files so it’s quick
  4. jupyter notebook –no-browser >> jupyter.log 2>&1 &
    Jupyter really, really wants to be an interactive program. This sort of makes it headless.
  5. ssh -L 8888:localhost:8888
    I’ve installed and run Jupyter on a server. I run this command on my local development machine to forward the port to access it.
  6. open http://localhost:8888/ in a browser
  7. %matplotlib inline
    Be sure to run this as the first command in the notebook


Django + lolslackbot, a battle plan

I’m thinking again of doing some work on lolslackbot, in particular adding a web frontend to it. Tired of adding users with ugly SQL hacks, tired of having no reporting views. I experimented with using Django six months ago and even sort of still have the code but have totally forgotten how it worked. I also never came to a conclusion about the path forward. In particular whether I should add Django first, or I should port from sqlite to postgres first.

Well I’m cutting that Gordian knot based on 30 seconds advice on IRC. I should add Django first, Django + sqlite. It should all work fine, the only drawback is sqlite database level locking will suck with multiple web users. Which is fine for now. Adding Django lets me add new features to the product, which is what motivates me. I’ll port to sqlite later if I open the webapp to multiple users and need better locking. A rough plan:

  1. Get some basic read-only views of the database
  2. Related: sort out my problem with representing many-to-many relationships. I fear this might require changing my database schema to be compatible with Django’s way of doing things.
  3. Focus on the social tables: People, Groups, Destinations. Create views like “who is in which group” and “where do these messages for this group go?”
  4. Also do some reporting, like “show last 10 messages sent” and “how many messages total are we sending a day?”
  5. Implement some basic tests for the Django code so I have a test environment for Django database code.
  6. (Cautiously) add some write capability for the social tables. Like “Add this person to be tracked” and “add this person to a group”.
  7. Take a deep breath! Stuff is working!
  8. Work on a plan to let other people log in and add users / groups.


Ubuntu name lookup: DNS vs NSS

Usually when I want to look up a domain name, I use the “dig” or “host” command. This issues a live DNS request to the configured DNS servers and you get a response from DNS. Sometimes I use “whois” on the domain first to find the domain authority, then do a “dig @server name” to find the actual live authoritative response, bypassing whatever cache my usual resolver (Google DNS) has.

But client programs on Ubuntu don’t issue DNS queries directly. They use NSS, the Name Service Switch, as configured in /etc/ (In the long long ago a similar system was named YP, for Yellow Page). NSS often ends up using DNS to resolve a hostname. But it also uses local files like /etc/hosts. I’ve always had some fear it did its own caching too, although I don’t think my system does. There’s a daemon called nscd that would do caching, but I don’t have it installed.

Anyway, ordinary programs like “ssh” or “ping” don’t use DNS directly, they use NSS to look up IP addresses. So how do you query what NSS would do? The getent program.

$ getent ahosts STREAM DGRAM RAW
2607:f8b0:4000:80d::200e STREAM
2607:f8b0:4000:80d::200e DGRAM
2607:f8b0:4000:80d::200e RAW

That’s returning a lot of data! “getent hosts” returns a single address, which is generally what you mean. But for Google that gives you an IPv6 address. The “ahosts” command prints all possible matches, which is useful for debugging.

I’m digging into all this because of a frustrating problem with a freedns dynamic DNS name I use for my home machine. Every six months they decide I’m inactive and change the address to some domain parking page. In theory they email me a warning but it never gets through. So it breaks scripts I have that use the name to access to my home network. And I dutifully log in an re-enable my account, but it takes awhile for the changes to propagate.

After I re-enabled my account, “host” and “dig” would show the right address but “ping” or “ssh” would use the wrong address. Turns out if a domain is disabled freedns sets its A records to both and, their parking web server. So that’s confusing. On top of that they set a TTL for one hour, which is kind of awful for something you’re trying to update. But I think my real problem was that one Google name server had cached the old bad address but another had cached the good new address, so queries were giving me random ones.

I’m going to just add a manual entry to /etc/hosts for now until that hour period is up.