Nelson's log

Understanding Python 3.4 asyncio

Working with has made me realize how I really don’t understand asyncio, the modern way to do asynchronous code in Python. Truthfully I barely understand any sort of async programming at all. I can sort of bash away at NodeJS a little bit. And I once wrote an asyncore program that did some neat network stuff. But mostly I’ve used threads or processes to do concurrency. It doesn’t help that I generally am starting with a single threaded synchronous program and then trying to later shoehorn it into some async context.

But soldiering on, here’s some resources:

Here’s my understanding of today, with the massive caveat that I’m totally new to this and may have important aspects wrong.

The key concept in Python asyncio is the event loop. When you are writing an async program you no longer are executing functions imperatively. Instead you are creating objects that ask for your functions to be called. (The “async def” keyword or the @asyncio.coroutine decorator do this for you; they wrap your imperative functions with an object, so when you invoke the decorated function it doesn’t actually run but instead returns the callback object). You then supply these objects to the event loop through functions like “call_soon()”. Finally you set the event loop running with run_forever() or run_until_complete(), which gives the event loop control over the CPU where it blocks while it runs your code via all the callback objects you gave it.

The other key concept in Python asyncio is coroutines and/or generators. That’s where the “yield” keyword comes in, and it’s actually the same old yield keyword we’ve had since Python 2.3. It’s the way for a Python function to return CPU control back to the thing that called it, yet also keep its stack state so if that function is called again it picks up where it left off. asyncio in 3.4 is literally implemented just as a bunch of fancy generator functions and lots of yielding everywhere. 3.5 adds the new syntax support with “await”. It also makes async functions into “coroutines” instead of “generators”. But the basic concept is still the same.

So that’s the core async magic. It’s really just a sort of process scheduler for cooperative multitasking. The event loop is the CPU scheduler that invokes tasks in a single Unix process/thread. Tasks politely interrupt themselves by calling yield when they can.

But there’s a lot more in asyncio. A key thing is that there’s convenience functions for places where you are most frequently going to want to yield. Ie: waiting for a socket, or a stream, or a subprocess, there’s convenience code for saying “get me the next bytes from this socket and oh by the way go ahead and yield control until there’s something to read”. There’s also a higher level wrapper for “protocols”.

And beyond that, there’s a lot of building blocks. There are Tasks and Futures to wrap some asynchronous thing in a structure. There’s a “wait()” method for collecting results from multiple asynchronous things. There’s locks and semaphores. Many of these convenience objects seem to basically be wrappers for the basic yield/await syntax.

So I think that’s how it all fits together. Now to translate that theory into actual working code to see if it’s right.

Update: thanks to a reader comment from Luciano Ramalho I read A Web Crawler With asyncio Coroutines and basically have my understanding above confirmed and deepened. Despite the title, the article is really building up a full asyncio implementation from basic Python generators, explaining in detail how everything works. Now that I understand this much better I look forward to forgetting entirely how asyncio works and just embracing and using its abstractions. If it’s well designed, it should be usable without knowing how it was implemented.

Update 2: see A Simpler Explanation of asyncio