Working with discord.py has made me realize how I really don’t understand asyncio, the modern way to do asynchronous code in Python. Truthfully I barely understand any sort of async programming at all. I can sort of bash away at NodeJS a little bit. And I once wrote an asyncore program that did some neat network stuff. But mostly I’ve used threads or processes to do concurrency. It doesn’t help that I generally am starting with a single threaded synchronous program and then trying to later shoehorn it into some async context.
But soldiering on, here’s some resources:
- Python 3.4 asyncio docs: the core reference. As with all Python docs it is well written, but unlike some Python docs it’s hard to follow. Much more of a reference than a tutorial.
- Module of the Week: asyncio: a heroic effort to provide an overview tutorial of the complete module with lots of examples. It’s good, but I’m still confused after reading it.
- Exploring Python 3’s Asyncio by Example and The New asyncio Module in Python 3.4: Event Loops, two more tutorial articles that just hit the highlights.
- A Web Crawler With asyncio Coroutines: a thorough explanation of how asyncio works, building it up from first principles with generators.
- How the heck does async/await work in Python 3.5?: a more low-level explanation focussed on the await/async keywords in Python, includes an example of implementing your own framework instead of using asyncio
- PEP 3156 is the proposal that resulted in asyncio, written in the Python 3.3 era with a reference implementation called “Tulip”. I haven’t read this yet.
- PEP 0492 is the proposal that resulted in the “async def” and “await” keywords being added to the language in Python 3.5. Also “async with” and “async for”. To normal coders, this is mostly syntactic sugar for stuff that worked in Python 3.4. But behind the scenes Python added a whole new concept of “coroutines” to the language, to accompany generators.
- PEP 255, PEP 342, and PEP 380 are 5+ year old PEPs related to defining generators in Python. (The “yield” keyword.) Generators are the basis of coroutines, and coroutines are the foundation of asyncio.
- aiohttp is the popular HTTP library for use with asyncio.
Here’s my understanding of today, with the massive caveat that I’m totally new to this and may have important aspects wrong.
The key concept in Python asyncio is the event loop. When you are writing an async program you no longer are executing functions imperatively. Instead you are creating objects that ask for your functions to be called. (The “async def” keyword or the @asyncio.coroutine decorator do this for you; they wrap your imperative functions with an object, so when you invoke the decorated function it doesn’t actually run but instead returns the callback object). You then supply these objects to the event loop through functions like “call_soon()”. Finally you set the event loop running with run_forever() or run_until_complete(), which gives the event loop control over the CPU where it blocks while it runs your code via all the callback objects you gave it.
The other key concept in Python asyncio is coroutines and/or generators. That’s where the “yield” keyword comes in, and it’s actually the same old yield keyword we’ve had since Python 2.3. It’s the way for a Python function to return CPU control back to the thing that called it, yet also keep its stack state so if that function is called again it picks up where it left off. asyncio in 3.4 is literally implemented just as a bunch of fancy generator functions and lots of yielding everywhere. 3.5 adds the new syntax support with “await”. It also makes async functions into “coroutines” instead of “generators”. But the basic concept is still the same.
So that’s the core async magic. It’s really just a sort of process scheduler for cooperative multitasking. The event loop is the CPU scheduler that invokes tasks in a single Unix process/thread. Tasks politely interrupt themselves by calling yield when they can.
But there’s a lot more in asyncio. A key thing is that there’s convenience functions for places where you are most frequently going to want to yield. Ie: waiting for a socket, or a stream, or a subprocess, there’s convenience code for saying “get me the next bytes from this socket and oh by the way go ahead and yield control until there’s something to read”. There’s also a higher level wrapper for “protocols”.
And beyond that, there’s a lot of building blocks. There are Tasks and Futures to wrap some asynchronous thing in a structure. There’s a “wait()” method for collecting results from multiple asynchronous things. There’s locks and semaphores. Many of these convenience objects seem to basically be wrappers for the basic yield/await syntax.
So I think that’s how it all fits together. Now to translate that theory into actual working code to see if it’s right.
Update: thanks to a reader comment from Luciano Ramalho I read A Web Crawler With asyncio Coroutines and basically have my understanding above confirmed and deepened. Despite the title, the article is really building up a full asyncio implementation from basic Python generators, explaining in detail how everything works. Now that I understand this much better I look forward to forgetting entirely how asyncio works and just embracing and using its abstractions. If it’s well designed, it should be usable without knowing how it was implemented.
Update 2: see A Simpler Explanation of asyncio