Non-ASCII filenames

I’m on a quest to clean up my music library. I decided to give up and make all my music filenames ASCII; no µ-Ziq, no Sigur Rós, no préludes. I feel bad but after 10 years of struggles I still occasionally run into problems with Unicode filenames. Also most music players are grabbing the display info from metadata tags anyway, the filenames don’t matter so much.

Anyway, here’s a quick trick for finding non-ASCII filenames

LC_ALL=C find . -name ‘*[! -~]*’

The find range names any character outside 0x20 to 0x7f. The LC_ALL setting is presumably to disable any proper text processing. Ie: bytes, not characters.

The bigger project here is the music library grooming. I had this all locked down in 2008 or so when almost all my music was ripped by a service. But since then I’ve acquired a bunch of music of uncertain provenance with all sorts of random tags. I think it was when I discovered I had a Genre named “drownstep” I realized it was time to clean things up.

Putting lolslackbot on the back burner

I’ve decided to stop working on lolslackbot, my social project for League of Legends players who use Slack or Discord. I wrote it originally for a few friends, then slowly expanded it to a few hundred users. But I’ve never put the work in to make it a consumer product and now am not motivated to do it.

The main feature missing is any sort of web interface so people could sign up for themselves. I’ve been maintaining it by hand with database update scripts, doing ~30 minutes of one-off work every few weeks instead of one focussed month-long engineering project. This blog is full of bold plans to port the whole thing to Django and get going on a web interface, but I never did it. Too much product work I don’t really know how to do well, designing interactive web UI. Hell, I don’t even have a proper name for the project.

Also some deeper technical problems. The Django port seems doable but requires database schema changes, specifically in how many-to-many relations work. And I got part of my core schema wrong, an assumption that an individual only belongs to one group. Fixing that would require redoing pretty much all the tests and half the business logic. Also at some point I’d have to migrate from sqlite to Postgres and that doesn’t sound like fun at all. In retrospect it’s too bad I didn’t start with Postgres+Django, but that seemed complicated at the beginning when I was thinking of this as just a cron job.

My real reason for lack of enthusiasm is the market. I like games and I like the idea of making game playing more social. But League of Legends is a hard community to build humble tools for. Most of the energy there is to highly polished and well marketed sites like LolKing and I’m just not that ambitious. There’s not much money in it (Riot’s API requires you don’t charge for services) and not a lot of love either. Me and my gaming buddies are on a bit of a LoL break too, which makes it harder to stay personally motivated. I’m also bummed that Riot hasn’t done anything more with Clubs, their social feature, my hope was to springboard off of that to build out the bot.

I did get some data from the last user population of Learning Fives, a cohort of ~80 people playing games together for a few weeks. 50% said they found it useful, 20% said it wasn’t, and 30% didn’t know what it was (despite seeing it in their channel). Not sure what conclusion to draw from that.

Anyway it’s a weight off my mind to just say I’m not going to do further work on this, at least for now. Truthfully my mind is on political work right now, I’d really like to do some sort of progressive activism combining data processing and GIS. (I’m following Mike’s work on redistricting closely.) To the extent I do anything for games it’s about time to revisit Logs of Lag, which 2.5 years later is still running just fine and uniquely useful. But I have some bugs to fix and maybe some improvements to make.

 

Synchronizing Windows directories

I’ve been using Allway Sync to keep my two new Windows machines in sync. It’s pretty good. I sync one machine to an external hard drive, then sync that drive to the other machine. So really it’s three copies. Allway Sync also supports various Internet services for syncing, like Dropbox or its own cloud service.

The sync algorithm seems relatively robust. It’s a bidirectional sync, so changes on both sides can propagate at the same time. There are safety features like a sanity check if too many files changed. It tracks metadata and deleted files so deletes can propagate. It also keeps old versions of things in a hidden _SYNCAPP folder. I haven’t delved into details of how it handles a single large binary file that’s only partially changed, I don’t know if it has binary diffs or if it just copies the whole file over.

The UI is pretty awkward. Mostly it’s just ugly. But it’s also a bit confusing. I believe you have to define a new job for each folder you want to sync. (There is a way to sync one source folder to multiple destinations.) I finally found that if you enable the toolbar, there’s an icon for “sync all jobs at once”. I haven’t tried the automatic syncing yet, it seems to rely on a service watching for change file events and/or on a timer.

The trick now is figuring out what all in Windows I can safely sync. My Steam apps folder is safe to sync, and saves me from having to download a 30GB game twice. Syncing my Documents folder seems to work too and propagates some app settings, game save files, not to mention my actual work output like source code and text documents.

The part I’m on the fence about is syncing my Roaming profile. That should keep many more app settings in sync, and my first time copying it seems to have worked OK. But there’s a lot of random stuff in there that doesn’t quite seem like it should be copied, like Slack’s cache files. OTOH the Roaming profile was designed to be copied from one machine to another, so it should work? Edit after trying it I think syncing the whole Roaming profile is a bad idea. Parts of the folder, like Microsoft, seem to be treated specially. Also it makes the most sense to sync when nothing is open that writes to that directory, ie before you log in, and that’s awkward.

It’s free, but there’s a limitation on the number of files. The license is $26 but that’s for a single machine. $16 for a second one.

There’s a bunch of other sync options too. For a long time I used Unison to sync file systems. It works great and I’d still use it for a Unix command line. There are Windows build but I couldn’t get the GUI version to work. I also get nervous using Unix tools to manipulate Windows filesystems.

 

Windows: backing up a failing disk

Ken’s Windows 7 machine has a hard drive failing. Some errors, a few corrupted / lost files. We’ve already bought a replacement drive but are having a seriously hard time getting useful data off the old disk. The disk mostly works; he runs it with Windows for hours at a time (I know, bad idea). But it has some bad sectors.

Simple task: just make a backup. Copy all the files off of it. Windows Backup doesn’t seem to work with no useful feedback why. Casper, a third party disk imaging tool, gives up the moment it encounters a read error. Acronis True Image, another cloner, is free for Western Digital drive owners but the installer doesn’t even run. PCmover isn’t really a backup tool and isn’t backing up complete data. It’s a fucking clown show. I’m about to bust out a Linux system just to run ddrescue. I did find a suggestion that HDD Raw Copy Tool might work.

Hard task: set up the new PC. Ken would very reasonably like to migrate his own data and settings from Windows 7 to a new Windows 10 install on a new disk. This proves to be very difficult from a working disk and nearly impossible from a failing machine. Windows 10 abandoned Easy Transfer. Microsoft did a deal with PCmover but the software is a joke, most of what it moved doesn’t work on the new system. A specific problem is his username changed on the new Windows system and everything seems to have hardcoded the pathname of his old username, not just %user%.

Windows is such garbage.

 

Parsing fixed-width files in Python

I have some census data in a fixed width format I’d like to parse. Stuff like “columns 23-26 are the year in 4 digit form”. It’s easy enough to ad hoc parse this with string slicing. But by the time you handle naming, and type conversion, and stripping padding, and validation, etc etc you end up with a fair amount of code. You can parse CSV with just string split too, but anyone sane uses the CSV module. Is there a good fixed width module?

Not that I could find. I gave up and just did the ad hoc thing.

I thought FixedWidth was a candidate but after 20 minutes trying it, gave up on it. There’s packaging problems and the docs are poor. The tests are incomplete. The API is weird and seems more designed for emitting fixed width than parsing it. The final reason I gave up is it seems to require you specify a full schema; you can’t parse columns 99-103 unless you’ve said what to do with columns 1-98 forst. That was a nuisance.

The other option I found was Pandas read_fwf. I didn’t try it because Pandas is overkill for my project. But I know from CSV work that DataFrame is really nice, and the Pandas CSV module is quite comprehensive. I also know that even after parsing with read_csv you still have to do a lot of work to get it into a clean DataFrame. I’d definitely look into using this for more serious work.

Related question: are there standard metadata descriptions for fixed width formats? The census data has this thing called data dictionaries that are clearly meant to be parseable. But they’re in at least two formats right on the site. I feel like I’ve seen other government records with similar metadata descriptions.

Further reading: Extract, transform, and load census data with Python.

 

HD FM Radio

My new car supports HD Radio. What is that? It’s a US standard for radio stations to broadcast a digital sideband adjacent to their audio frequency. The FM version is a 100-150 kbit/sec digital stream that encodes data in HDC, a proprietary standard that is similar to AAC. In theory there can be multiple HD streams on one channel, and a station could be digital-only. I haven’t seen that in practice.

The audio quality definitely seems better than analog FM. It’s also better than the low bandwidth Sirius XM channels like their NPR channel, that thing is awful. The big bummer is that since it’s a sideband the digital FM broadcasts don’t really offter the opportunity to free up any spectrum. Not that the TV spectrum process has gone very well.

 

 

2017 Audi A3 vs SD cards

My new car has a fancy stereo from Audi / Bang & Olufsen that can play audio (and video!) off of SD cards. Awesome! So I loaded up a 128GB SDXC card with my music collection, inserted it, and watched it crash as it indexed. But it seemed to work anyway after rebooting. Only some music is missing.

Looking more closely I figured out music started being missing right around Johnny Cash. But nothing wrong with those Johnny Cash files, they work on a new card just fine. A bit more googling and head-scratching and I learned the Audi might only read 64GB of an SD card. This limit is baffling; the big limit on SD is 32GB, the SDHC format. The car’s manual explicitly says it supports up to 128GB cards. I found this other post that claims > 64GB cards work, but only with FAT32, not exFAT. There may also be limits on the number of files / directories / etc, but it’s vague.

Anyway, on my exFAT SDXC card everything after Johnny Cash is unreadable. At first I thought that was the 64GB limit but counting carefully really it’s like 69GB in. I’m also at about 9800 files before I hit the Js, so maybe it’s actually a 10,000 file limit?

Another mystery is the indexing system. It definitely scanned the SD card the first time it saw it, for several minutes before crashing. Now when I put the card in it doesn’t index it. I’m not positive if it’s detecting changes in the drive or not. (I think so?) Where is it storing the index? There’s a file on the root of the SD filesystem named “WMPInfo.xml” that it placed there, some Windows Media Player junk. But it’s only 296 bytes. The car has its own dedicated hard drive with a partition for a “jukebox” I can copy music to. Perhaps it has another partition for databases and stuff. It’s got a copy of the GraceNote database built in somewhere!