Converting video to x265 with avconv

A simple command line for converting video to small x265. These options are chosen to produce acceptably low quality, low bitrate output.

avconv -i in.mkv -c:v libx265 -c:a copy -crf 29 -s 854×480 -preset fast out.mkv

  • c:v libx265 means “encode using x265”.
  • c:a copy means “copy audio, don’t re-encode”
  • crf 29 means “slightly worse than default 28 quality”
  • s 854×480 means “resize to 480p”
  • preset fast means “I don’t have a hardware encoder so please don’t take forever”

Obviously better quality is possible, this is producing about 500-600kbit/second video or about 4x the same of the audio track.

Speaking of audio track it’d be nice to convert whatever the source is to 128kbps AAC stereo. I’m not clear if a simple “-ac 2” is sufficient or if more complex downmixing is necessary.

Windows: typing European on American keyboard

Apple has this great way to type non-ASCII characters like Ö or ß or € on both MacOS and iOS. You long-press the key that looks similar to what you want (an O, or an S, or a $) and then wait a bit and a UI pops up and you choose what you want from a UI picker. It’s not great for fast touch typing but for that you’d want a full language keyboard. But for wanting to type the occasional non-English word, it’s great.

Windows, not so much. The frequent advice is type it by character code as if I’m going to memorize 153 and 128 and 223. Not to mention my laptop doesn’t have a numeric keypad. The advice to use the Character Map in that post is not so awful, it works OK, but it’s pretty clumsy.

The best bet seems to be the US international keyboard. It’s an alternate American keyboard where ‘ ” ` ~ ^ are dead keys; they’ll modify the next letter to be accented, or else insert themselves if that makes no sense. It works pretty well. Even better the right Alt key can be used to type a bunch of other stuff like ß or €. It’s not perfect but it’s pretty good. Note the Windows 10 anniversary edition moved this; it’s now listed as an “Option” on the English (US) keyboard, not a separate keyboard.

I also tried Holdkey which promises to add an Apple-like typing UI. But it didn’t work very well, the way it grabs onto the keyboard is problematic. There’s some other options along these lines I didn’t try.

Update: if I type y’all I get yáll which I think is hilarious.

Update 2: by default Ctrl-Shift switches between keyboard layouts. That’s obnoxiously easy to press by accident. You can turn that off or change the key mapping.

Twitter API, tweepy, and error handling

I’m writing some Twitter API code. A few lessons learned in using the tweepy library.

The error handling is not awesome. All API errors throw tweepy.error.TweepError, which is fine, but that object has no structure. The docs have some stuff about getting codes that seems outdated. In the code there’s an api_code field , but it’s set to None. I’m doing stuff like e.reason.endswith(“401”) to detect the 401 error code Twitter throws for protected users. (This is also the code Twitter throws for an invalid token, so that’s not great.)

The rate limiting handling is pretty good. In particular there’s a wait_on_rate_limit option that is pretty handy for simply waiting for a rate limit to expire.

When paging through user_timeline the default fetch size is only 20 tweets. You can instantly get 10x the effective calls by bumping up to count=200.

Twitter’s rate limits are confusing. Nominally it’s “calls per 15 minutes” but there must be some sort of grace period or something built in because sometimes you get more calls than that. There’s a nice rate_limit_status API call that is very helpful but is sometimes out of date; in particular I’ve seen reset times that are now in the past, so apparently my limit reset (and calls are working) but the status call doesn’t tell me that. Anyway, here’s some Q&D Python code to print out which API calls you’re currently at less than maximum on.

r = twapi.rate_limit_status()
now = time.time()
for resource in r["resources"]:
    for name, status in r["resources"][resource].items():
        if status['remaining'] != status['limit']:
            print ("%4d of %4d left, reset in %ds %s" % (status['remaining'], status['limit'], status['reset'] - now, name))

SQLAlchemy thoughts

On the heels of my post about simple persistence I decided to give the SQLAlchemy ORM a try. I’ve spent a few hours with it and liked it. Here’s some notes on what I learned.

I’m not a huge fan of ORMs; they’re complicated and it’s very easy to write code that is either broken or has huge performance problems. Relational databases are not object systems, trying to mix the two models is always problematic. OTOH an ORM can be convenient.

So far I like SQLAlchemy. I particularly like its idea that it is there to help with your database, not enforce structure on your database. It’s quite flexible and can be adapted to most (all?) schemas. I also like that it’s relatively straightforward. For instance there’s little caching of objects, pretty much any call that might require a database query will just go ahead and do that query. That makes for potentially inefficient code, but then it also makes it clear and unmagic and avoids a lot of hard-to-find bugs. One can always add caching but it’s up to you to manage it.

I also particularly like the excellent SQL tracing; just add echo=True to the session and you get a nice clean log of the SQL being executed. Makes it easy to understand what is happening and also think about performance. See also preventing lazy loading.

What I don’t like is the Python code magic in SQLAlchemy. The way you create SQL queries with Python operators like == is built out of some operator overloading that is very strange. Also the model classes you create (one per table) have a huge amount of magic in them so that when you reference something simple like User.name what’s really going on is a bunch of code with side effects, including possible SQL queries. When it all works it makes the code look simple, but in my experience this kind of hidden stuff can get you in trouble. But then again ORMs are all about doing ill-advised things marrying object oriented code to relational data, so might as well use some magic to make it look nicer.

Testing

I keep hoping to find an ORM that has first class support for mock objects for unit testing. So I can write all my higher level tests without any database, just mock objects generated by the ORM itself. No one seems to do this though, either because it’s hard or because it’s a bad idea to fake out such a key component of your software. So instead everyone uses a special test database and some combination of test fixtures and transaction rollbacks to manage it.

A lot of SQLAlchemy testing examples use pytest, in part because its fixture support is so good. So I’m using pytest for the first time and I like it.

  • sqlite’s in-memory databases are great for a fast test database. But only use this if you intend to deploy to sqlite; you must test against the same database you are using in production.
  • The asphalt project has testing patterns that I ended up cribbing from. Right now I’ve got it setting up a transaction for every single test whether it uses a dbsession or not, I should probably fix that. Note the listener for after_rollback events to handle application rollbacks, no other example I’ve seen is doing that.
  • More ideas of fixture patterns: this blog post, this book chapter. The latter also has an idea on mocking out the database connection itself to return fake formatted data instead of using a test database.
  • pytest-sqlalchemy is a resuable set of fixtures. It looks OK but pretty simple and the project is inactive. (Maybe it doesn’t need activity!)
  • factory_boy is a test fixture library with specific support for ORMs including SQLAlchemy. Alternative to pytest fixtures, I guess.

Alembic schema migrations

SQLAlchemy does not come with any support for schema migrations. Alembic seems to be the consensus choice for migrations. It’s only about half-automated,  you’re expected to be reviewing and editing scripts. Which seems reasonable enough. One nice thing about Alembic is you can bring it in on a SQLAlchemy project that already exists, you don’t have to start from the beginning with it.

A few notes:

  • If your code isn’t in a Python package and just has simple “import model” style imports, you can fix up Alembic’s import by adding your source directory to PYTHONPATH. Or see the discussion for editing env.py
  • When you first set up Alembic you probably already have a schema in place. I decided to generate an empty migration to mark this existing schema. You still have to apply this migration so Alembic know’s what’s going on, by running “alembic upgrade head”.
  • There’s a cookbook for having Alembic history going back to an empty database.
  • sqlite is a PITA for migrations because it doesn’t have much support for alter table. But Alembic does have nice “batch” support for “move-and-copy” migrations.

Wanted: simple persistent apps

What’s an easy way to build a Python app that grabs data from somewhere every few minutes, persists it, and then does some analysis of it?

I’m building a new thing that gets some data from Tweets and stores it. The main loop is downloading new tweets for a group of users and storing details of those tweets. It’s not very complicated. Structurally it’s similar to my lolslackbot project.

And I’m building it the same bad way. Persistence via sqlite with hand-rolled SQL statements. This is tedious, error prone, and makes upgrading / changing things difficult. Testing is also a huge PITA.

Is there some modern alternative that’s better?

My first thought is to at least use some sort of ORM. Django or SQLAlchemy, presumably. I’ve done a little Django and it’s hard to love it. Also ORMs cause problems, although maybe not at the small scale I’m working at.

I’m wondering if there’s some other fancier / funkier persistence system I should consider. I kind of want the simplicity and flexibility that an object database or maybe a NoSQL system promises. But those don’t seem to work out well in practice.

Alfred replacements for Windows

I loved Alfred on MacOS. 95% of the time I used it to launch apps with the keyboard, but I also liked its clipboard management, access to Google searches, etc. What about Windows? Turns out there’s lots of options. I cribbed these from this Reddit discussion, also see this article. Many of these are cross-platform and work on Mac and/or Linux.

Hain is pretty slick and what I just chose to switch to. It feels almost exactly like Alfred to me, the only thing it’s missing is number key shortcuts to pick items. It’s not very configurable (no skins for instance) but the defaults are good. I added some plugins: Clipboard (to replace Ditto), Google. The terrible / great thing about Hain is it’s implemented in Electron so there’s a full HTML5 shell running. Or several maybe. I see four hain.exe processes total about 200 MB of RAM. And 0.02% CPU which is small but not zero.

Zazu is newish and open source. It seems OK but not as polished as Hain; configuring it requires editing a JSON file, for instance. It’s also HTML5 based (Chrome?), 240MB in memory and 0.02% CPU.

Wox is popular and pretty solid. The install is strange; it says it requires you also install both everything (a file indexer) and Python3. But I think those are both optional upgrades. It’s a little ugly out of the box, Hain is nicer. Wox is 125 MB of RAM and 0.12% of CPU. It’s built in .NET.

Launchy is what I used to use. It’s old and ugly and doesn’t really do anything but launch apps. And that, not very well. Nothing to love but it does work and is free.

FARR is another option with a long history. I couldn’t figure out how to rebind the hotkey, and it’s kinda old and ugly too, but it seems very capable. Also quite small; 20 MB RAM and 0.00% CPU when idle, like it should be.

 

While I’m here, a shout-out to Seer which gives a space bar quick-preview to the Windows Explorer, like MacOS finder has. It’s about $12. Lightshot also seems like a nice improved screenshot-taker. It’s free, I think they’re trying to drive ad traffic to prntscr.com. (Speaking of which my Apple keyboard has no PrntScrn button. It does have an F13 though which you can bind in Lightshot, even if the UI shows a blank key when you bind it.)

Twitter API: extended tweets

tl;dr: if you’re writing new Twitter API code, be sure you’re requesting extended tweets.

Boy am I out of touch. Twitter did a major API change over a year ago to better support modern tweets that don’t really fit in 140 characters. I stumbled into this while trying to use the API to get the URLs for this tweet. My old code mostly worked but this tweet is long enough to trigger the need for extended tweets. My old code was seeing a URL for the tweet it was in reply to and not the bendbulletin.com URL.

The clue was that the tweet object also had “truncated: true” set on it. What is that? Happily I found this discussion which explained about extended tweets. Twitter also documented this thoroughly last year, it’s explained well in this document.

By default the Twitter API still returns “compatibility tweets” which strictly fit in 140 characters. And in other ways are pretty broken, or at least incomplete. You have to add a tweet_mode=extended to your request URL to get the new extended format. Just to complicate things this new format has some changes, most notably there is no more “text” field. Now it’s called “full_text”. There’s also some funky third format for streaming and Gnip APIs that has both formats in a single message.

I fixed my Tweepy code by adding tweet_mode=’extended’ to the Cursor constructor.

I can’t see any reason any new code shouldn’t immediately switch to using extended tweets forever and always. It’s a shame Twitter didn’t go with a version numbering scheme to change things like this in a neater way, although versions have problems too.

While I’m here, two grumbles about the Twitter API:

Twitter’s own web client no longer is using the JSON API for everything, some stuff is pre-rendered. This means you can’t just use their web app to crib examples of API calls from.

Twitter’s API console for interactive testing of requests is kind of nice. But they don’t format the JSON responses in a useful way! You just get one long-ass line of text you have to paste into a prettifier to make sense of.