Twitter Python clients by release date

I want to write a program using the long-suffering Twitter API. There’s a zillion Python options, here’s what I found. I used the most recent PyPI release date as my primary criteria, because I want something actively maintained.

Of course release age isn’t everything. All four libraries linked above have similar APIs for doing simple things. I believe all four now support the streaming API too. On a quick glance tweepy and python-twitter have the most reassuring test suites.

I picked tweepy to start with because it has the most recent recommendations on /r/python. Also there’s still activity on GitHub, even if no recent releases. So far so good.

Windows sshfs clients

I want to access a remote machine’s Unix filesystem via ssh, the way you’d do with sshfs and FUSE in Unix. There’s a lot of options for it.

  • SFTPNetDrive, what I’m using now. Works simply out of the box. Free for personal use, $100 for commercial use.
    Seems fine, Y: is now my remote disk. Has some reasonable customization options but nothing overwhelming. There’s an option to mount the disk as “Network, Removable, or Fixed”. I was hoping that could fake out WSL so that I could see the remote disk in the Ubuntu-on-Windows subsystem (which now supports removable disks). No such luck.
  • NetDrive, the big daddy commercial system. This provides interfaces for a whole bunch of cloud storage options: S3, OneDrive, etc. 30 day eval, then you pay $50
  • ExpanDrive, another multi-service commercial option. $50
  • sshfs-win using WinFSP. Open source FUSE-like solution. Looks promising.
  • win-sshfs, a fork of an older project (keyword: Dokan). No commits for 9 months, but does have Windows 10 support so it’s not totally dead.

 

Canon MX850 and Windows 10

tl;dr: reinstall the Windows 8.1 driver to get the scanner working.

I have a Canon MX850 I bought 8 years ago. It’s one of those cheap multifunction inkjet printer/scanner/fax things, the ones that cost $20 but each ink refill costs a small part of your immortal soul. The printer has been pretty reliable, but the scanner keeps failing. I’m not certain but my guess is that Canon’s network scanner driver isn’t smart enough to follow the scanner through IP address changes from DHCP. But the printer keeps working, huh.

Anyway, the simple fix is to download and reinstall Canon’s driver. No need to uninstall first. Windows 10 is not supported by Canon, but the 8.1 (x64) driver seems to work fine. At least scanning does, via Windows Fax and Scan. Haven’t tried the fax side.

DDNS clients

I set up a new router and so once again am trying to figure out the hack I need for Dynamic DNS updating. I use afraid.org for DDNS and I need something on my home network so that if my IP address changes I notify afraid to update my record. And boy it’s complicated. In the past my router firmware (Tomato or OpenWRT) seemed to handle it for me, but my new router (Ubiquiti EdgeMAX) isn’t working. It has a DDNS client, but it’s fail.

Why so complicated? Because of history. In the bad old days there were all these complicated protocols for DDNS. Software like ddclient has support for a zillion different proprietary protocols. But DDNS has mostly now stabilized on “fetch this URL with a secret token to refresh your IP address”, taking the IP address implicitly from the source of the HTTP request. And that’s all we need. afraid acknowledges this, with a sample cron job which fetches every 5 minutes. Done.

Well not quite. Ideally you’d only post an update if your IP address actually changed. And ideally you’d update immediately, not every 5 minutes. This is possible (and implemented in the Tomato router) but it’s too much work to figure out how to do it. So the cron job it is.

One last wrinkle; afraid has two versions of the dynamic update. v1 and v2. Conceptually they’re the same, but v2 has improvements. The one I like is v2 doesn’t say “ERROR” if you post an update without a change, it just acknowledges that nothing changed. You have to opt names into the v2 interface in the web UI first.

I like how afraid’s example cron job is randomized so it runs at a random second, not exactly at 00:00:00 and 00:05:00, … Thundering herds are bad.

RF scan in San Francisco

My fancy Ubiquiti access point has an RF scan tool. Here’s what it found in my house in San Francisco, Noe Valley.

Capture.PNG

Capture2.PNG

 

Not a great visualization; the yellow bar fills up the trapezoid. Mouseover tells me the 2.4GHz channels are about 20-30% “utilized” with “interference” of -95 dBm (noise floor?). 5.0GHz channels are mostly 0% utilized except for 112 and 136, which are about 20%. Interference there is -91 dBm for most, -96 for a couple.

My iPhone sitting nearby can see about 4 neighbor’s WiFi networks. Also probably a lot of 2.4GHz cordless phones, including mine. In practice 2.4GHz barely works through the house and 5.0GHz is terribly unreliable, probably because the walls are lath and plaster faraday cages.

 

Simplifying the efficiency gap measure

Spoiler: this is a summary of a failed experiment.

There’s a wonkish new measure of gerrymandering that’s influencing the courts called the efficiency gap. It is a measure of partisan bias of a districting plan, trying to quantify “wasted votes”. It’s not too complicated a measure. Suppose a single district election goes 53 voters for candidate D and 47 for candidate R. Then 2 of those D votes were wasted, because they were in excess of the 51 votes needed for a victory. All 47 of the R votes were wasted, since their candidate lost. If you count up all those wasted votes over all the districts you have a measure of how many votes were wasted in a state. For instance if the Rs consistently win 80% of a state’s districts 53/47 and the Ds win 20% of a state’s districts 20/80, then the Ds waste a lot of votes both in the winning districts (80% instead of 51%) and in the losing districts (47%). When you apply this measure to real elections it quantifies just how much the Republicans successfully gerrymandered a partisan advantage in states like Pennsylvania or Ohio.

It’s a simple measure, but it’s still complicated to explain to people. So I took a crack at quantifying something even simpler, the gap between popular vote and congressional seats. For instance in North Carolina last year Republicans took 53% of the popular vote for House of Representatives. However they took 10 of 13 seats, 77%. That’s a pretty remarkable spread of 24%. That outcome spread is a different measure than the efficiency gap, much less subtle, but it’s one I think people can relate to pretty easily.

What happens if I add that spread up over all 50 states’ elections and graph it over time? I scraped data from Wikipedia pages and came up with this:

image.png

There you go. It shows that in 2016 the Republicans got 55% of seats with about 50% of the popular vote, a spread of +5%. Back in 1990 they got only 39% of the seats with 46% of the vote, a spread of -7%. It’s the spread between the two that I’m interested in, so let’s graph the spread directly:

image (3).png

Note the graph of the spread is more or less correlated with the blue line above, the % of seats the Republicans got. I’m not sure what that means but it seems important.

Visualization Failure: Oversimplified

My main conclusion from this exercise is it’s a failure. I’ve oversimplified. there’s no obvious story here. I think any fine point about redistricting gets lost in the broader story of swings between Republican and Democrat popularity.

I think it’s a mistake to add up the votes and seats across different states. Every state is a different story. North Carolina had a virulently partisan districting which probably explains its huge gap in favor of Republicans. California has an independent non-partisan commission designing its districts, so a different process, although the outcome in 2016 still seems somewhat to benefit Democrats (62% popular vote, 73% of seats).

It might also be a mistake to assume all Republicans and Democrats are the same. I mean as partisan as the United States has become, Representatives are still elected to represent a local community and in many cases are known to their constituents. Congresspeople have individual agenda and views and don’t just reflect their R/D affiliation.

So back to the drawing board. I should consider just redoing visualizations from the efficiency gap data Stephanapolous and McGhee collected.

Update

There’s some nice visualizations of the efficiency gap in this paper from the Campaign Legal Center

Time series view:

Capture.PNG

Per-state view. (Darker color = bigger efficiency gap. Larger square = bigger state.)

 

Capture2.PNG.

Redistricting tools infodump

For the last couple of months I’ve been reading a lot about redistricting, hoping to find a way I can use my programming skills to have some political influence. There’s a remarkable story in recent politics summarized in the book Ratf**ked; the Republican Party engineered a takeover in 2010 of the House of Representatives. Long story short, they worked hard to win state legislature elections in 2008 and 2010. This gave them control over the redistricting process in 2010/2011, drawing the lines for House of Representatives elections. That line-drawing worked, the cartography alone is responsible for something like 10% of the Congressional seats that the GOP holds. You get crazy things like North Carolina having 10 of 13 representatives being Republican, despite the popular vote for those seats going about 50/50. The GOP is quite explicit about what they did, too:

The state representative who drew that [North Carolina] map said he had engineered 10 safely Republican seats only “because I do not believe it’s possible to draw a map with 11 Republicans and two Democrats.”

I’m interested in the data and cartography part of this, the technical question. So I’ve been reading up. Here’s some of what I learned. See also Mike Migurski’s notes.

Maptitude: ArcGIS for politics

The key software tool used in 2010 is Maptitude for Redistricting. It’s a GIS tool specialized for redistricting. It comes preloaded with demographic data, election data, and political boundary shapes. It lets you draw district plans and see what the result is. Think of it as ArcGIS but for political scientists. There are many demo videos online.

An individual license for Maptitude costs $700 on Amazon. Mostly they negotiate sales contracts for group use, Ratf**ked suggested it was $5000-$10,000 for a state. Compare $500,000 for previous software from the 1990s.

Maptitude is by Caliper Software. They seem to have a democratic mission, putting redistricting tools in the hands of ordinary people. In 2011 they even made a web version for communicating to voters called Maptitude Online. Several states bought and deployed it: Maryland, Idaho, Wyoming… None of those 2011 apps still work, the backing server seems down. But there’s video online. It’s a pretty clunky web UI, from the era when people compiled their Win32 C++ GUIs to Javascript and called it a day. I imagine the 2020 version will be much more Web native. But it does work to genuinely communicate redistricting to people on the Internet.

Political data

One nice thing about Maptitude is it comes with political data that you buy from them. But it’s all proprietary and expensive. What about open source / open data hacking? Wanted, for every state, for every election every 2 years:

  • Shapefiles for each voter precinct. The voting precinct is the smallest building block of election data. You can’t find out how an individual voted in the United States, but you can see how a group of ~25-300 people in one precinct voted.
  • Per-precinct vote tallies for every major national election. President, Senate, House of Representatives. State legislature data would be nice too.
  • Shapefiles for every House of Representatives voting district. These are typically, but not always, the union of precincts. State election districts would be nice too.

America has no centralized election system. There is no simple database of election results, political districts, etc. Particularly not for data as detailed as per-precinct results. Every state maintains their own data. Some states publish nice clean CSV and Shapefiles. Some states will send you a scanned handwritten ledger if you call and pay a $4.95 document fee. It’s a mess. Here’s what’s available for open use.

  • OpenElections collects election returns. It came out of a data journalism product and they are doing good work towards a 2016 set but it is a long process.
  • election-geodata is a project headed up by Nathaniel Kelso which collects precinct and district shapefiles. It’s not complete but has gotten a whole lot of data, particularly for 2016.

Between those two projects I think we have ~75% of 2016’s election in easy-to-use format.

Several folks have published detailed maps on the Web but have not published formal data exports. Decision Desk HQ published a per-precinct map of the Presidential election in 2016. ESRI published a per-precinct map for the 2008 presidential election. I could have sworn I’ve seen a 2012 national per-precinct map too, but I can’t put my hands on it. The LA Times Data Desk has also done good work but it may be California-only.

Also worth reading: Mike Migurski’s work on North Carolina elections, where he goes through the exercise of collecting and analyzing the 2016 election for that state. It’s a model of what I’d like to be able to do easily for every state and every election.

Demographic data

I haven’t researched this much yet, but the other half of redistricting is understanding the demographics of the people you’ve put into the districts.

Census data is the basic data standard here, free and public and easy to work with. It comes broken down by census blocks, groups of 0-500 people. Those blocks do not line up with voting precincts, so some slicing and joining is required to produce per-precinct views of census data. Migurski did this for North Carolina.

Marketing data is the other interesting option here, and a total black box to me. But all of the tracking tools that enable direct sales and Internet advertising are producing data that’s a political goldmine. I suspect little to none of this data is available for free open hackery. News articles about the 2016 campaign are full of stories about how various political groups used this data with varying levels of effectiveness.

Edit: see this amazing article about Cambridge Analaytica, Trump, and Brexit.

Areas of work for redistricting

When I started this research I had no idea something like Maptitude existed. I thought maybe I could help build a GIS-for-elections and revolutionize politics. Ha! I’m at least 10 years too late on that. The state of the art in 2011 was a tool that let political experts draw district plans and then understand with great detail how those people voted in recent elections and therefore, how they would likely vote in the next election. What about 2020?

  • Demographic prediction. District plans last ten years; you can design a plan that looks great for your party in 2022 only to find it fails in 2028, what is called a “dummymander”. Predicting demographic trends sounds like a good data problem. There’s a lot of expertise on this already.
  • Automated districting. I get the impression 2010 districts were still mostly hand drawn. But computers can easily produce zillions of plans. It’s a problem ripe for optimization algorithms.
  • Smarter measures of gerrymanders. This topic is hot politically now, particularly a new measure called the efficiency gap which quantifies the partisan bias in a district plan. Communicating these measures to the voting public seems really valuable.

The hard part for me with all this is I’ve rapidly learned that redistricting is not a technical problem, not something I can program a solution to. It is a political problem. People with political expertise and power are going to set the agenda. But software is a tool for that politics, maybe a tool I can help craft.