Simplifying the efficiency gap measure

Spoiler: this is a summary of a failed experiment.

There’s a wonkish new measure of gerrymandering that’s influencing the courts called the efficiency gap. It is a measure of partisan bias of a districting plan, trying to quantify “wasted votes”. It’s not too complicated a measure. Suppose a single district election goes 53 voters for candidate D and 47 for candidate R. Then 2 of those D votes were wasted, because they were in excess of the 51 votes needed for a victory. All 47 of the R votes were wasted, since their candidate lost. If you count up all those wasted votes over all the districts you have a measure of how many votes were wasted in a state. For instance if the Rs consistently win 80% of a state’s districts 53/47 and the Ds win 20% of a state’s districts 20/80, then the Ds waste a lot of votes both in the winning districts (80% instead of 51%) and in the losing districts (47%). When you apply this measure to real elections it quantifies just how much the Republicans successfully gerrymandered a partisan advantage in states like Pennsylvania or Ohio.

It’s a simple measure, but it’s still complicated to explain to people. So I took a crack at quantifying something even simpler, the gap between popular vote and congressional seats. For instance in North Carolina last year Republicans took 53% of the popular vote for House of Representatives. However they took 10 of 13 seats, 77%. That’s a pretty remarkable spread of 24%. That outcome spread is a different measure than the efficiency gap, much less subtle, but it’s one I think people can relate to pretty easily.

What happens if I add that spread up over all 50 states’ elections and graph it over time? I scraped data from Wikipedia pages and came up with this:

image.png

There you go. It shows that in 2016 the Republicans got 55% of seats with about 50% of the popular vote, a spread of +5%. Back in 1990 they got only 39% of the seats with 46% of the vote, a spread of -7%. It’s the spread between the two that I’m interested in, so let’s graph the spread directly:

image (3).png

Note the graph of the spread is more or less correlated with the blue line above, the % of seats the Republicans got. I’m not sure what that means but it seems important.

Visualization Failure: Oversimplified

My main conclusion from this exercise is it’s a failure. I’ve oversimplified. there’s no obvious story here. I think any fine point about redistricting gets lost in the broader story of swings between Republican and Democrat popularity.

I think it’s a mistake to add up the votes and seats across different states. Every state is a different story. North Carolina had a virulently partisan districting which probably explains its huge gap in favor of Republicans. California has an independent non-partisan commission designing its districts, so a different process, although the outcome in 2016 still seems somewhat to benefit Democrats (62% popular vote, 73% of seats).

It might also be a mistake to assume all Republicans and Democrats are the same. I mean as partisan as the United States has become, Representatives are still elected to represent a local community and in many cases are known to their constituents. Congresspeople have individual agenda and views and don’t just reflect their R/D affiliation.

So back to the drawing board. I should consider just redoing visualizations from the efficiency gap data Stephanapolous and McGhee collected.

Update

There’s some nice visualizations of the efficiency gap in this paper from the Campaign Legal Center

Time series view:

Capture.PNG

Per-state view. (Darker color = bigger efficiency gap. Larger square = bigger state.)

 

Capture2.PNG.

Redistricting tools infodump

For the last couple of months I’ve been reading a lot about redistricting, hoping to find a way I can use my programming skills to have some political influence. There’s a remarkable story in recent politics summarized in the book Ratf**ked; the Republican Party engineered a takeover in 2010 of the House of Representatives. Long story short, they worked hard to win state legislature elections in 2008 and 2010. This gave them control over the redistricting process in 2010/2011, drawing the lines for House of Representatives elections. That line-drawing worked, the cartography alone is responsible for something like 10% of the Congressional seats that the GOP holds. You get crazy things like North Carolina having 10 of 13 representatives being Republican, despite the popular vote for those seats going about 50/50. The GOP is quite explicit about what they did, too:

The state representative who drew that [North Carolina] map said he had engineered 10 safely Republican seats only “because I do not believe it’s possible to draw a map with 11 Republicans and two Democrats.”

I’m interested in the data and cartography part of this, the technical question. So I’ve been reading up. Here’s some of what I learned. See also Mike Migurski’s notes.

Maptitude: ArcGIS for politics

The key software tool used in 2010 is Maptitude for Redistricting. It’s a GIS tool specialized for redistricting. It comes preloaded with demographic data, election data, and political boundary shapes. It lets you draw district plans and see what the result is. Think of it as ArcGIS but for political scientists. There are many demo videos online.

An individual license for Maptitude costs $700 on Amazon. Mostly they negotiate sales contracts for group use, Ratf**ked suggested it was $5000-$10,000 for a state. Compare $500,000 for previous software from the 1990s.

Maptitude is by Caliper Software. They seem to have a democratic mission, putting redistricting tools in the hands of ordinary people. In 2011 they even made a web version for communicating to voters called Maptitude Online. Several states bought and deployed it: Maryland, Idaho, Wyoming… None of those 2011 apps still work, the backing server seems down. But there’s video online. It’s a pretty clunky web UI, from the era when people compiled their Win32 C++ GUIs to Javascript and called it a day. I imagine the 2020 version will be much more Web native. But it does work to genuinely communicate redistricting to people on the Internet.

Political data

One nice thing about Maptitude is it comes with political data that you buy from them. But it’s all proprietary and expensive. What about open source / open data hacking? Wanted, for every state, for every election every 2 years:

  • Shapefiles for each voter precinct. The voting precinct is the smallest building block of election data. You can’t find out how an individual voted in the United States, but you can see how a group of ~25-300 people in one precinct voted.
  • Per-precinct vote tallies for every major national election. President, Senate, House of Representatives. State legislature data would be nice too.
  • Shapefiles for every House of Representatives voting district. These are typically, but not always, the union of precincts. State election districts would be nice too.

America has no centralized election system. There is no simple database of election results, political districts, etc. Particularly not for data as detailed as per-precinct results. Every state maintains their own data. Some states publish nice clean CSV and Shapefiles. Some states will send you a scanned handwritten ledger if you call and pay a $4.95 document fee. It’s a mess. Here’s what’s available for open use.

  • OpenElections collects election returns. It came out of a data journalism product and they are doing good work towards a 2016 set but it is a long process.
  • election-geodata is a project headed up by Nathaniel Kelso which collects precinct and district shapefiles. It’s not complete but has gotten a whole lot of data, particularly for 2016.

Between those two projects I think we have ~75% of 2016’s election in easy-to-use format.

Several folks have published detailed maps on the Web but have not published formal data exports. Decision Desk HQ published a per-precinct map of the Presidential election in 2016. ESRI published a per-precinct map for the 2008 presidential election. I could have sworn I’ve seen a 2012 national per-precinct map too, but I can’t put my hands on it. The LA Times Data Desk has also done good work but it may be California-only.

Also worth reading: Mike Migurski’s work on North Carolina elections, where he goes through the exercise of collecting and analyzing the 2016 election for that state. It’s a model of what I’d like to be able to do easily for every state and every election.

Demographic data

I haven’t researched this much yet, but the other half of redistricting is understanding the demographics of the people you’ve put into the districts.

Census data is the basic data standard here, free and public and easy to work with. It comes broken down by census blocks, groups of 0-500 people. Those blocks do not line up with voting precincts, so some slicing and joining is required to produce per-precinct views of census data. Migurski did this for North Carolina.

Marketing data is the other interesting option here, and a total black box to me. But all of the tracking tools that enable direct sales and Internet advertising are producing data that’s a political goldmine. I suspect little to none of this data is available for free open hackery. News articles about the 2016 campaign are full of stories about how various political groups used this data with varying levels of effectiveness.

Edit: see this amazing article about Cambridge Analaytica, Trump, and Brexit.

Areas of work for redistricting

When I started this research I had no idea something like Maptitude existed. I thought maybe I could help build a GIS-for-elections and revolutionize politics. Ha! I’m at least 10 years too late on that. The state of the art in 2011 was a tool that let political experts draw district plans and then understand with great detail how those people voted in recent elections and therefore, how they would likely vote in the next election. What about 2020?

  • Demographic prediction. District plans last ten years; you can design a plan that looks great for your party in 2022 only to find it fails in 2028, what is called a “dummymander”. Predicting demographic trends sounds like a good data problem. There’s a lot of expertise on this already.
  • Automated districting. I get the impression 2010 districts were still mostly hand drawn. But computers can easily produce zillions of plans. It’s a problem ripe for optimization algorithms.
  • Smarter measures of gerrymanders. This topic is hot politically now, particularly a new measure called the efficiency gap which quantifies the partisan bias in a district plan. Communicating these measures to the voting public seems really valuable.

The hard part for me with all this is I’ve rapidly learned that redistricting is not a technical problem, not something I can program a solution to. It is a political problem. People with political expertise and power are going to set the agenda. But software is a tool for that politics, maybe a tool I can help craft.

Fog of World, imports

Played with Fog of World today, an iPhone app that turns GPS tracking into a game. It records where you go and then paints the map, “lifting the fog of war” in videogame style. It’s heavily gamified; lots of badges and stuff for exploring more of the world. I love the idea of this kind of thing, my previous experiments with mapping walks I took is in the same vein. Also GPS tracks for my private airplane flights.

The engineering is pretty good quality. It has very thoughtful import/export options, including a nice integration with Dropbox. Map rendering is pretty good. Plenty of care applied to battery management.

I’ve been running OpenPaths for years, recording my world travels. It took about 5 minutes to export that data to KML and import it to Fog of World and boom, I’m level 75 with a filled out passport. Unfortunately OpenPaths only records point locations every few minutes, not full paths, so I have a bunch of little speckles of revealed parts of India rather than a clear path on a trip. Fair enough.

I also imported my RunKeeper data from my old walks project. That gives much smoother paths. The display is still kinda janky though, if I were building this app I’d give paths more width and smoothing. That or else try to do some scalable measure, where if I’m looking at a whole country map I see all of the Bay Area filled in, but then I can zoom in for more detail.

Update: this app seems to be killing my battery. The usage monitor in iOS only shows it using 20-30% of my energy, but having it running is the difference between  a 5% battery in the morning vs a 60% battery. At least I thinks it is.

I’m trying Strut now, a similar thing, but much more heavily focussed on competition. Also no import / export :-( But it is free. (Update 2: it’s even harder on the battery than Fog of World). Also while researching all this I found a nice comparison of Fog of War v1 vs v2.

Screenshot below, compare to this map

Photo Apr 05, 10 35 49.png

SSL and hostname privacy

The GOP’s craven selling-out of user privacy to ISPs has me wondering, just how private is the modern web?

Only sort of private. HTTPS Everywhere is working, at this point I’m surprised if a site doesn’t support SSL. So the contents of web requests and replies are encrypted. Of course the IP address of who you’re talking to is not protected. But the hostname is also in cleartext; your ISP can see that you are specifically visiting eff.org or nelsonslog.wordpress.com. This is true both in HTTP/1.1 and HTTP/2.0.

Really the hostname is being exposed by SSL/TLS itself, in the Server Name Indication. The SNI is the way a single server can serve multiple web domains, it’s virtual hosting for SSL. The client initiates a connection and sends in cleartext “I’m trying to talk to eff.org”. The server responds with negotiating a connection using eff.org’s certificate. (The hostname may also appear in the request as a Host header, but I believe that’s encrypted.)

Why not encrypt the hostname in SNI? Who do you encrypt it for? SSL is as much about verifying server identity as it is encrypting the traffic, so the protocol starts with the client asking for a specific identity to set up the encryption. I think the hostname could be encrypted in the handshake with a second encryption channel, but I haven’t thought hard about it. And I have no idea how hard it would be to implement in practice, much less get adopted.

Note that HTTP/2.0 doesn’t do anything new about hostnames, it just uses TLS 1.2. The HTTP/2.0 spec mandates SNI support and that the client use it to specify the destination host.

Outside HTTP, your DNS hostname queries are also being sent in plaintext. So your ISP could snoop (or hijack) those. Thanks to Tom Jennings for pointing this out to me. In theory DNSsec or DNSCrypt or something like Dingo could protect against this, maybe now’s the time to get serious about deployment.

While I’m here, a lot of discussion yesterday was along the lines of “well time to use a VPN”. But VPNs don’t really solve the problem and introduce lots of new problems on their own. I only like them when you’re in a hotel with a shitty network or you need a specific private intranet connection.

It’s really a shame IPsec failed.

Two unrelated PS4 gripes

Sony’s online store site requires a login. The login requires third party cookies; logging in to  store.playstation.com requires setting using a cookie at auth.api.sonyentertainmentnetwork.com, so it won’t work if you have third party cookies disabled. This breaks the Brave Browser, which disables third party cookies by default. It also breaks in Chrome if you disable the cookies. I thought Safari prevented third party cookies by default, I wonder if they have a workaround? The error displayed is

{“error”:”invalid_client”,”error_description”:”Bad client credentials”,”error_code”:4102,”parameters”:[]}

The PS4 operating system, or the apps, have a bug where if a game is put in the background the fan gets really loud. Ie, if you open the system menus. I assume instead of suspending processing entirely the game starts spinning consuming lots of CPU or GPU, generating heat, which makes the fan run. It’s dumb. A fan theory is that the games have no framerate cap and so they effectively render at 1000 FPS or whatever, but I think the problem may be broader than that.

 

LA Times and ads

The LA Times is a good newspaper and is currently doing the best political coverage in California. They are also the most aggressive ad shoveling website I have ever seen. Their ad blocker blocker and paywall works, preventing me from reading articles. I even tried installing an ad blocker blocker blocker which doesn’t work.

So I open articles like this in incognito mode, and let it run its ads, and close the popups and mute the videos and try to ignore the visual distraction. But boy that page does not go quietly. Here’s how they reward their readers.

Capture

That’s a timeline of 30 seconds of page activity about 5 minutes after the article was opened. To be clear, this timeline should be empty. Nothing should be loading. Maybe one short ping, maybe loading one extra ad. Instead the page requested 2000 resources totalling 5 megabytes in 30 seconds. It will keep making those requests as long as I leave the page open. 14 gigabytes a day.

There’s no one offender in the network log, it’s a wide variety of different ad services. The big data consumer was a 2 megabyte video ad, but it’s all the other continuous requests that really worry me.

A lot has been written about the future of journalism, the importance of businesses like the LA Times being profitable as a way to protect American democracy. I agree with that in theory. But this sort of incompetence and contempt for readers makes me completely uninterested in helping their business.

Edit for pedantic nerds: for some unfortunate reason this blog post ended up on Hacker News where people raised eyebrows at my 14 gigabytes / day estimate. To be crystal clear: I did a half-assed measurement for 30 seconds and measured 5 megabytes. 5 * 2 * 60 * 24 = 14,400, or about 14GB. That’s all I did.

Yes, I know that extrapolation is possibly inaccurate. If you want to leave a browser window open for 24 hours to measure the true number, be my guest; I’ll even link to your report here. (Hope you have lots of RAM and bandwidth!) I’m a web professional with significant experience measuring bandwidth and performance, I know how to do this right. But that’s the LA Times webmaster’s job, not mine, and I don’t have a lot of confidence that they much care.

3.2 hours is the current record for measuring traffic before something crashes.

Edit 2: for a lot of people visiting my blog I think it is their first view of a network timeline graph like this. Generating your own timeline for any web page is easy if you use Chrome. It’s a standard view in the Network tab in Developer Tools. See the docs, but briefly you open dev tools and click on network and watch the waterfall graph. You want to open it before loading the page so it captures everything from the start. Beware a normal web page will end very soon and be boring; the LA Times is a very special website.

Here’s a larger view of 200 seconds of that same LA Times article page, and a direct link to the full size image. (It’s funny redoing it, every time I look at this page it does some new insane thing.)

Capture.PNG

Upgrading a PS4 hard drive

I just upgraded the hard drive in my Playstation. It was remarkably easy following these very thorough directions. Sony designed the system to be easily upgraded, quite a surprise.

The hardware is super simple; slide off the cover, remove a single branded screw, and slide out the drive tray. Swap in the new drive (4 screws) and slide it back in, done. I replaced it with a 2TB hybrid drive. A full SSD would be faster but costs 5x as much. These Seagate SSHD drives have a 64GB SSD caching the 2TB platter and seem like a reasonable compromise.

The software is also surprisingly simple. Back up the PS4 to a spare external  USB drive, swap in the blank new drive, then boot to recovery mode. You can then reinstall the firmware from a download Sony provides, restore your backup, and you’re done. It took all of 30 minutes of my time, plus 5 hours waiting for the backup and restores to copy.

My main reason for the upgrade was more space. The old units shipped with a 500GB drive which after overhead, etc allowed for about 6 games to be installed at once. Significant nuisance; I don’t want to download 30GB a second time for something I want to revisit.