Linux: adding a drive, UUIDs

I replaced a 7 year old drive in my Linux server. The drive had been complaining about bad blocks in SMART forever (which still mystifies me; drives should be able to just remap those). But I was seeing other signs the drive might be failing, so better safe than sorry.

The thing that made this complicated is I wanted to be sure I understood how device names worked so I didn’t screw up which drive was mounted where. Old school names like /dev/sdb1 seem fine, but those names change if you remove drives, add new ones, etc. So I read up and learned the new hotness is to mount drives by naming their UUID, some stable unique string based on the drive partition’s geometry. /etc/fstab will take a UUID in the first column instead of a device name and happily mount it for you. That’s it, pretty simple. In particular udev is not involved and there are no symlinks required anywhere. I have no idea how mount finds the device named by UUID but it works, so I’m happy to remain ignorant.

I replaced the old 7200 RPM WD Blue with a 5400 RPM WD Blue. That’s kind of a cheap drive for a Linux server but I’m only using it as a backup volume. I keep being tempted to get an SSD for the main system volume.

Here’s the steps I followed for the new drive, mostly following this guide.

lshw -C disk: find the new hard drive. Easiest way is to match it via serial number, or other characteristics like size and model name. My new disk got named /dev/sdb, which awkwardly was what the disk I just took out was named too.

smartctl –smart=on /dev/sdb: turn on SMART for the disk. Honestly I don’t exactly know what this does but it seems like a good idea.

fdisk /dev/sdb: partition the disk. fdisk is the old school MBR partitioning, which is limited to 2TB maximum. My disk is 2TB so that’s OK. Newer systems use GPT (an EFI thing) and parted. I just made one large partition for the whole disk.

mkfs -t ext4 /dev/sdb1: make the filesystem. There’s some options here you could consider setting to get a bit more disk space or add checksumming to metadata but I stuck with the defaults. Fun fact: I was taught in 1990 to print out the list of superblock backups because if the disk failed it was the only way you were going to find them backup block IDs. I assume recovery tools have improved in the last 28 years. (Or more realistically, that the disk will be a lost cause.)

blkid | grep sdb1: find the UUID for the new partition

fstab: edit fstab to mount the new disk named by UUID.

All very easy really.


Flickr exports, fixup tool plan

Ahead of the Great Deletion, Flickr has a decent export tool built in to the user settings page. You click the export button, wait a day or two for an email, and then get some ZIP files to download.

I posted a little summary of what’s in the exports on Metafilter. Long story short, I think it’s pretty much all the data and metadata Flickr has. Here’s an expanded version of it:

Photos

  • 4 zip files are my photos
  • Photos are in JPG format with EXIF tags. I’m not positive but I believe these are the original bits I uploaded from my camera, or something similar. There is a lot of EXIF data intact.
  • Filenames are the title of the photo (downcased) plus the photo’s Flickr ID
  • File timestamps are bogus dates in 2013/2014

Metadata

  •  1 zip file with a bunch of JSON files
  • Most of the JSON files are one file per photo that includes Flickr metadata. Photo title, description, time, tags, geotags, comments on the photo, etc.
  • Several other JSON files with things like all the comments I’ve made, JSON data for my photo albums (a collection of photos), etc.

Conversion tool plan

I’m not aware of any tools that do much with this Flickr metadata, but I haven’t looked hard. I’ve considered writing my own minimal one with an eye towards extracting all the most important metadata from the JSON and stuffing it in EXIF tags. Ideally the resulting JPG files would then look reasonable when imported into Google Photos and/or Lightroom. Some specifics:

  • Set the JPG filename from the JSON photo title more nicely than Flickr did
  • Set the JPG file timestamp to the creation date in the EXIF data. If there is no EXIF timestamp, then take something from the Flickr JSON.
  • Insert Flickr’s JSON geotags from the photo into an EXIF geotag (if one doesn’t exist). I geocoded a bunch of photos by hand in Flickr, I’d really like to preserve that data
  • Insert Flickr’s JSON name and description tags into the EXIF in appropriate textual fields.
  • Insert Flickr’s tags into the EXIF; is there an appropriate field?
  • Capture Flickr comments from the JSON into the EXIF?
  • Flickr JSON has a “people” tag but I don’t think I’ve ever used it.

Logs of Lag outage

Oh boy did I screw up. Back on September 1 I pushed a tiny change to Logs of Lag, an old service I still run for League of Legends players. It was such a simple change, just adding an HTML link, so I didn’t test it carefully. Turns out I made that change on top of the master branch which had some other changes I’d committed but never tested or deployed back in 2015, and that new-but-old code didn’t work. The site’s been broken for 7 weeks now and I only found out when a user wrote me.

I love this commit comment past-me wrote:

Potentially breaking change: webapp use new stats. This change has not been tested manually, hence the warning in deploy.sh

I did read the commit log before pushing again, but apparently I didn’t read back far enough. Also I didn’t use my deploy script to deploy the server. Talk about shooting yourself in the foot. I even have a tiny bit of monitoring on the service but it didn’t show this kind of error, not that I pay attention to the monitor anyway.

The real problem is that I’ve abandoned this project; I haven’t done real development on it in 4 years. It’s kinda broken now too as the file format the code parsers has changed over time and I haven’t kept up. I’m now 100% fed up with League of Legends and Riot Games, given how sexist and awful that company is. So I have no motivation to do more. But the tool is still useful so I’ve tried to keep it online. (Past me did one thing right: the site is mostly just running static files, so it’s not hard to keep running.)

The site doesn’t get a lot of usage, my estimate last year was 50-100 unique users a day with about ~450 uses of the tool a day. Here’s a graph of the rate of usage; you can see an organic falloff over the whole year, then it falls sharply Sep 1 when I broke the site. I wonder if it will recover.

logsoflag2-year.png

PS4 6.02: more external storage woes

As I wrote in an earlier post, on a PS4 there’s no way to make a copy of a game’s download files to an external drive. You can move the files to a drive but they are then deleted from the source. Which is a huge PITA when the game requires a 90 GB download.

But it gets better. If you plug an external drive with a copy of a game into a PS4 that already has a copy of that game, the software freaks out. It insists you delete one of the two copies before you can use the external drive. There’s no way to tell it to ignore the duplicate to, say, let you get at some other game on the external drive. You must delete first.

So not only can you not create copies of games, but if you screw up you’ll be forced to delete a copy you downloaded. Argh!

(I reiterate; none of this is about copy protection; the PS4 online DRM will prevent you from playing the game if your login doesn’t have a license for it, whether you have a copy on the drive or not.)

PS: the external drive copying is awfully slow. a 50GB game image is taking 18 minutes, or about 50 MB/s. That’s just about USB 2.0 speeds. The drive, cable, and the PS4 itself are all supposed to support USB 3.0. Maybe it’s the spinning drive speed limiting things.

Yet more Python ORMs

I continue to fail to use ORMs with Python. They’re just so complicated and mysterious. But maybe that’s really just SQLAlchemy and I should look elsewhere.

PeeWee popped up on my radar recently, an ORM explicitly designed to be small and simple. It looks pretty good. Also have heard a couple of people mentioning PonyORM recently. It seems far too magic.

Going even simpler, I should circle back and spend more time with requests and/or dataset. They’re not even really ORMs, just convenience wrappers for database rows, and that seems fine by me. Still bugs me they both depend on SQLAlchemy even if the actual interaction is minimal.

 

PS4 5.50+ USB hard drive, copying games

A tale of failures. I wanted to copy a PS4 game I have off the internal hard drive onto a USB drive, so I could copy it onto a second console instead of redownloading 60GB+ worth of stuff. This turns out to be impossible.

Since version 5.50 the PS4 has gotten a lot friendlier about supporting USB drives. Among other things you can relatively easily plug in a USB drive and copy user data (like game saves) to and from it. The PS4 will do this with any exFAT or FAT32 disc, like your garden variety flash drive.

You can also move a game from the internal hard drive to a USB. Key point there: move, not copy. Copy is explicitly not allowed. You can only move a game to disc if the disc is formatted as “extended storage” for the PS4. I thought I’d be clever and move the game to an external drive, make a clone, then move it back. But you can’t do that easily. The extended storage volume doesn’t even have a meaningful partition table, no filesystem I can mount. I suppose I could make a block level clone of the disk but that’s too much trouble. (It’s probably just some slightly bent version of a standard FreeBSD or Windows filesystem, but I’m not going to waste time researching that.)

One last option PS4 supports is to back up your whole console to an external disc, then restore it to a different console. You can also clone one PS4 to another directly over a LAN. However in both cases the destination PS4 gets entirely wiped and reconfigured, it’s not suitable for making a copy of a game.

The stupid thing about all this is Sony probably still thinks it’s doing some useful copy protection. It’s not, the copy protection is entirely keyed now to your online account. They’re just making it awkward for power users to do reasonable things.

 

Linux I/O schedulers

My Linux server is mostly idle, but every few hours I run a backup and then the disks get very busy. I’m also trying to rebuild a corrupted Duplicati database now, a second heavy I/O load. If both run at once the system does not run very well. Worse, interactive performance it terrible; like 30 seconds to log in and get a shell.

I think a fix for this is changing the Linux I/O schedule. I checked and all my hard drives were set to the deadline scheduler. I changed them to CFQ and interactive performance seems better, although I didn’t test carefully.

I’m not clear why CFQ isn’t something you’d just use all the time. I guess there’s some overhead associated with it vs. noop but I have to think it’s not very much under normal workloads. Deadline seems like a bad idea unless you’re in a very special environment. But I don’t really understand this stuff.