Flickr exports, fixup tool plan

Ahead of the Great Deletion, Flickr has a decent export tool built in to the user settings page. You click the export button, wait a day or two for an email, and then get some ZIP files to download.

I posted a little summary of what’s in the exports on Metafilter. Long story short, I think it’s pretty much all the data and metadata Flickr has. Here’s an expanded version of it:

Photos

  • 4 zip files are my photos
  • Photos are in JPG format with EXIF tags. I’m not positive but I believe these are the original bits I uploaded from my camera, or something similar. There is a lot of EXIF data intact.
  • Filenames are the title of the photo (downcased) plus the photo’s Flickr ID
  • File timestamps are bogus dates in 2013/2014

Metadata

  •  1 zip file with a bunch of JSON files
  • Most of the JSON files are one file per photo that includes Flickr metadata. Photo title, description, time, tags, geotags, comments on the photo, etc.
  • Several other JSON files with things like all the comments I’ve made, JSON data for my photo albums (a collection of photos), etc.

Conversion tool plan

I’m not aware of any tools that do much with this Flickr metadata, but I haven’t looked hard. I’ve considered writing my own minimal one with an eye towards extracting all the most important metadata from the JSON and stuffing it in EXIF tags. Ideally the resulting JPG files would then look reasonable when imported into Google Photos and/or Lightroom. Some specifics:

  • Set the JPG filename from the JSON photo title more nicely than Flickr did
  • Set the JPG file timestamp to the creation date in the EXIF data. If there is no EXIF timestamp, then take something from the Flickr JSON.
  • Insert Flickr’s JSON geotags from the photo into an EXIF geotag (if one doesn’t exist). I geocoded a bunch of photos by hand in Flickr, I’d really like to preserve that data
  • Insert Flickr’s JSON name and description tags into the EXIF in appropriate textual fields.
  • Insert Flickr’s tags into the EXIF; is there an appropriate field?
  • Capture Flickr comments from the JSON into the EXIF?
  • Flickr JSON has a “people” tag but I don’t think I’ve ever used it.

Logs of Lag outage

Oh boy did I screw up. Back on September 1 I pushed a tiny change to Logs of Lag, an old service I still run for League of Legends players. It was such a simple change, just adding an HTML link, so I didn’t test it carefully. Turns out I made that change on top of the master branch which had some other changes I’d committed but never tested or deployed back in 2015, and that new-but-old code didn’t work. The site’s been broken for 7 weeks now and I only found out when a user wrote me.

I love this commit comment past-me wrote:

Potentially breaking change: webapp use new stats. This change has not been tested manually, hence the warning in deploy.sh

I did read the commit log before pushing again, but apparently I didn’t read back far enough. Also I didn’t use my deploy script to deploy the server. Talk about shooting yourself in the foot. I even have a tiny bit of monitoring on the service but it didn’t show this kind of error, not that I pay attention to the monitor anyway.

The real problem is that I’ve abandoned this project; I haven’t done real development on it in 4 years. It’s kinda broken now too as the file format the code parsers has changed over time and I haven’t kept up. I’m now 100% fed up with League of Legends and Riot Games, given how sexist and awful that company is. So I have no motivation to do more. But the tool is still useful so I’ve tried to keep it online. (Past me did one thing right: the site is mostly just running static files, so it’s not hard to keep running.)

The site doesn’t get a lot of usage, my estimate last year was 50-100 unique users a day with about ~450 uses of the tool a day. Here’s a graph of the rate of usage; you can see an organic falloff over the whole year, then it falls sharply Sep 1 when I broke the site. I wonder if it will recover.

logsoflag2-year.png

PS4 6.02: more external storage woes

As I wrote in an earlier post, on a PS4 there’s no way to make a copy of a game’s download files to an external drive. You can move the files to a drive but they are then deleted from the source. Which is a huge PITA when the game requires a 90 GB download.

But it gets better. If you plug an external drive with a copy of a game into a PS4 that already has a copy of that game, the software freaks out. It insists you delete one of the two copies before you can use the external drive. There’s no way to tell it to ignore the duplicate to, say, let you get at some other game on the external drive. You must delete first.

So not only can you not create copies of games, but if you screw up you’ll be forced to delete a copy you downloaded. Argh!

(I reiterate; none of this is about copy protection; the PS4 online DRM will prevent you from playing the game if your login doesn’t have a license for it, whether you have a copy on the drive or not.)

PS: the external drive copying is awfully slow. a 50GB game image is taking 18 minutes, or about 50 MB/s. That’s just about USB 2.0 speeds. The drive, cable, and the PS4 itself are all supposed to support USB 3.0. Maybe it’s the spinning drive speed limiting things.

Yet more Python ORMs

I continue to fail to use ORMs with Python. They’re just so complicated and mysterious. But maybe that’s really just SQLAlchemy and I should look elsewhere.

PeeWee popped up on my radar recently, an ORM explicitly designed to be small and simple. It looks pretty good. Also have heard a couple of people mentioning PonyORM recently. It seems far too magic.

Going even simpler, I should circle back and spend more time with requests and/or dataset. They’re not even really ORMs, just convenience wrappers for database rows, and that seems fine by me. Still bugs me they both depend on SQLAlchemy even if the actual interaction is minimal.

 

PS4 5.50+ USB hard drive, copying games

A tale of failures. I wanted to copy a PS4 game I have off the internal hard drive onto a USB drive, so I could copy it onto a second console instead of redownloading 60GB+ worth of stuff. This turns out to be impossible.

Since version 5.50 the PS4 has gotten a lot friendlier about supporting USB drives. Among other things you can relatively easily plug in a USB drive and copy user data (like game saves) to and from it. The PS4 will do this with any exFAT or FAT32 disc, like your garden variety flash drive.

You can also move a game from the internal hard drive to a USB. Key point there: move, not copy. Copy is explicitly not allowed. You can only move a game to disc if the disc is formatted as “extended storage” for the PS4. I thought I’d be clever and move the game to an external drive, make a clone, then move it back. But you can’t do that easily. The extended storage volume doesn’t even have a meaningful partition table, no filesystem I can mount. I suppose I could make a block level clone of the disk but that’s too much trouble. (It’s probably just some slightly bent version of a standard FreeBSD or Windows filesystem, but I’m not going to waste time researching that.)

One last option PS4 supports is to back up your whole console to an external disc, then restore it to a different console. You can also clone one PS4 to another directly over a LAN. However in both cases the destination PS4 gets entirely wiped and reconfigured, it’s not suitable for making a copy of a game.

The stupid thing about all this is Sony probably still thinks it’s doing some useful copy protection. It’s not, the copy protection is entirely keyed now to your online account. They’re just making it awkward for power users to do reasonable things.

 

Linux I/O schedulers

My Linux server is mostly idle, but every few hours I run a backup and then the disks get very busy. I’m also trying to rebuild a corrupted Duplicati database now, a second heavy I/O load. If both run at once the system does not run very well. Worse, interactive performance it terrible; like 30 seconds to log in and get a shell.

I think a fix for this is changing the Linux I/O schedule. I checked and all my hard drives were set to the deadline scheduler. I changed them to CFQ and interactive performance seems better, although I didn’t test carefully.

I’m not clear why CFQ isn’t something you’d just use all the time. I guess there’s some overhead associated with it vs. noop but I have to think it’s not very much under normal workloads. Deadline seems like a bad idea unless you’re in a very special environment. But I don’t really understand this stuff.

Zip codes vs census tracts

A lot of digital maps use zip codes as a binning feature. Election maps, property value maps, pollution maps. But while zip codes are convenient and familiar there’s a much better set of polygons for mapping most US data: census tracts. Zip codes are pretty unusual polygons(*), drawn mostly to make the process of mail sorting and delivery simpler. They are quite arbitrary. Census tracts are lovingly and carefully drawn to respect demographic and political reality.

I have a great example of the difference: River Oaks in Houston, one of the wealthiest neighborhoods in America. It’s the place where Saudi princes buy a 10,000 square foot house and glass in the backyard so they have air conditioning. Also plenty of local Houston money, mostly old money.

But you’d never know it looking at zip code averages. Because River Oaks shares 77019 with a bunch of people who live east of Shepherd, in a significantly denser, less wealthy, more diverse neighborhood. It also includes big chunks of the Fourth Ward which used to be quite a poor neighborhood and was recently redeveloped. By contrast River Oaks is almost perfectly described by census tracts 4112 and 4114. (Not exactly; the chunk in the SW of 4114 is not technically River Oaks, nor is it in 77019, but demographically it’s quite similar to the rest of 4114.)

Census tracts are part of a hierarchy of census polygons. Census tracts aim to have about 4000 people in them, (zip codes average 8000 people, but vary greatly). Block groups are smaller and have ~1500 people each. Census blocks are the smallest division, about 40 blocks per group, but unlike census tracts they are highly variable (in terms of statistical demographics). There are 11M census blocks in the US, although nearly half of them are uninhabited.

Census tracts are frequently named by 4 digit numbers like 4112 or 4114, with a county name or something else to disambiguate them. But some tracts have numbers with decimals or other complicating factors, like 32 or 204.01. (I believe canonically census tracts are named with 6 digit numbers, with the decimal and leading zeros frequently omitted).

FIPS codes are the larger naming scheme; the full name of 4112 is sometimes given as 48201411200. 48 is Texas, 201 is Harris County, 4112 the census tract, and the 00 is the block group. Census blocks get their own 4 digit number at the end, making for a full 15 digit code to name the most precise polygon.

(*) technically speaking zip codes are actually just lists of points. The polygons are properly called zip code tabulation areas and there are some important differences.