Understanding Python 3.4 asyncio

Working with discord.py has made me realize how I really don’t understand asyncio, the modern way to do asynchronous code in Python. Truthfully I barely understand any sort of async programming at all. I can sort of bash away at NodeJS a little bit. And I once wrote an asyncore program that did some neat network stuff. But mostly I’ve used threads or processes to do concurrency. It doesn’t help that I generally am starting with a single threaded synchronous program and then trying to later shoehorn it into some async context.

But soldiering on, here’s some resources:

  • Python 3.4 asyncio docs: the core reference. As with all Python docs it is well written, but unlike some Python docs it’s hard to follow. Much more of a reference than a tutorial.
  • Module of the Week: asyncio: a heroic effort to provide an overview tutorial of the complete module with lots of examples. It’s good, but I’m still confused after reading it.
  • Exploring Python 3’s Asyncio by Example and The New asyncio Module in Python 3.4: Event Loops, two more tutorial articles that just hit the highlights.
  • A Web Crawler With asyncio Coroutines: a thorough explanation of how asyncio works, building it up from first principles with generators.
  • How the heck does async/await work in Python 3.5?: a more low-level explanation focussed on the await/async keywords in Python, includes an example of implementing your own framework instead of using asyncio
  • PEP 3156 is the proposal that resulted in asyncio, written in the Python 3.3 era with a reference implementation called “Tulip”. I haven’t read this yet.
  • PEP 0492 is the proposal that resulted in the “async def” and “await” keywords being added to the language in Python 3.5. Also “async with” and “async for”. To normal coders, this is mostly syntactic sugar for stuff that worked in Python 3.4. But behind the scenes Python added a whole new concept of “coroutines” to the language, to accompany generators.
  • PEP 255, PEP 342, and PEP 380 are 5+ year old PEPs related to defining generators in Python. (The “yield” keyword.) Generators are the basis of coroutines, and coroutines are the foundation of asyncio.
  • aiohttp is the popular HTTP library for use with asyncio.

Here’s my understanding of today, with the massive caveat that I’m totally new to this and may have important aspects wrong.

The key concept in Python asyncio is the event loop. When you are writing an async program you no longer are executing functions imperatively. Instead you are creating objects that ask for your functions to be called. (The “async def” keyword or the @asyncio.coroutine decorator do this for you; they wrap your imperative functions with an object, so when you invoke the decorated function it doesn’t actually run but instead returns the callback object). You then supply these objects to the event loop through functions like “call_soon()”. Finally you set the event loop running with run_forever() or run_until_complete(), which gives the event loop control over the CPU where it blocks while it runs your code via all the callback objects you gave it.

The other key concept in Python asyncio is coroutines and/or generators. That’s where the “yield” keyword comes in, and it’s actually the same old yield keyword we’ve had since Python 2.3. It’s the way for a Python function to return CPU control back to the thing that called it, yet also keep its stack state so if that function is called again it picks up where it left off. asyncio in 3.4 is literally implemented just as a bunch of fancy generator functions and lots of yielding everywhere. 3.5 adds the new syntax support with “await”. It also makes async functions into “coroutines” instead of “generators”. But the basic concept is still the same.

So that’s the core async magic. It’s really just a sort of process scheduler for cooperative multitasking. The event loop is the CPU scheduler that invokes tasks in a single Unix process/thread. Tasks politely interrupt themselves by calling yield when they can.

But there’s a lot more in asyncio. A key thing is that there’s convenience functions for places where you are most frequently going to want to yield. Ie: waiting for a socket, or a stream, or a subprocess, there’s convenience code for saying “get me the next bytes from this socket and oh by the way go ahead and yield control until there’s something to read”. There’s also a higher level wrapper for “protocols”.

And beyond that, there’s a lot of building blocks. There are Tasks and Futures to wrap some asynchronous thing in a structure. There’s a “wait()” method for collecting results from multiple asynchronous things. There’s locks and semaphores. Many of these convenience objects seem to basically be wrappers for the basic yield/await syntax.

So I think that’s how it all fits together. Now to translate that theory into actual working code to see if it’s right.

Update: thanks to a reader comment from Luciano Ramalho I read A Web Crawler With asyncio Coroutines and basically have my understanding above confirmed and deepened. Despite the title, the article is really building up a full asyncio implementation from basic Python generators, explaining in detail how everything works. Now that I understand this much better I look forward to forgetting entirely how asyncio works and just embracing and using its abstractions. If it’s well designed, it should be usable without knowing how it was implemented.

Update 2: see A Simpler Explanation of asyncio

4 thoughts on “Understanding Python 3.4 asyncio

  1. To understand asyncio, you must first understand generators. Really understand them. Here is the best article of the web on generators: http://stackoverflow.com/a/231855/9951

    Once you do understand generators, you can see that the event loop actually accept any generator, because the trick is that a generator is a kind of coroutine :

    >>> def hehe():
    … print(0)
    … yield
    … print(1)
    … yield
    … print(2)
    … yield
    … print(3)

    >>> asyncio.get_event_loop().run_until_complete(hehe())
    0
    1
    2
    3

    Now with a coroutine:

    >>> @asyncio.coroutine
    … def hoho():
    … print(0)
    … yield from ”
    … print(1)
    … yield from ”
    … print(2)
    … yield from ”
    … print(3)

    >>> asyncio.get_event_loop().run_until_complete(hoho())
    0
    1
    2
    3

    Same, same. async/await are a bit more complexe because await expect objects with an __await__. The goal is to allow you to await non coroutine objects such as futures, and integrate with other concurrency model such as threads.

    So in the end, the type “coroutine” in Python is just a thin layer on top of generators. And everytime you call await or yield from in a coroutine, you just activate another generator or awaitable. At the very bottom, one of the generator/awaitable is doing some magic asyncronous IO in C, but you don’t have to worry about that since you will never have to code it.

  2. Ah, I forgot. There is some semantic about asyncio that puts of people: this notion of task, coroutine and futures.

    Well:

    – a coroutine, as we have seen, is just a special generator.
    – a future is a wrapper on top of coroutine. It let you register a callback to get the result of a coroutine when the results has arrived. You actually never use it. It’s an internal details.
    – a task is a future. It’s just a futures that has been schedule for execution in the event loop. It’s a pending thing to do for the event loop. You do use that, but most of the time you don’t realize you do.

    So basically, when you use asyncio, you create a coroutine and pass it to the event loop. The event loop gives you back a task in exchange. This mean:

    – the coroutine is schedule for being executed. You don’t know when, the event loop will decide the best time for it and will take care of making it play concurrently with other coroutines.
    – the task let you register a callback to that you can get the result of the coroutine when it’s done.

    Now what’s nice about asyncio is that most of the time you don’t need to now you get back a task. You don’t don’t need to write a callback manually. You could, but generally you just use await/yield from the task and asyncio will make sure to pause your code at that line, and resume it when the result is available.

Comments are closed.