Nelson's log

Stack traces from running Python programs

My lolslackbot program occasionally hangs forever and I’d like to know what failed when I kill it. Starting with Python 3.3 it’s relatively easy using the faulthandler built-in library. You have to set it up ahead of time (it’s not on by default), but once you do you can send the process a SIGABRT and it will display a slightly spartan stack trace and abort the program.

You can get a stack trace without killing the Python process by registering another signal, like faulthandler.register(signal.SIGUSR1). That signal is not entirely unobtrusive; it interrupts time.sleep() for instance. But the program does seem to keep running after printing the stack trace. All the signals registered by default seem to also kill your program; USR1 won’t. (Unless I’m confused.)

It’s all implemented in C to enable printing useful stack traces even if the Python VM itself is broken. Also you can enable it just by setting an environment variable, before any Python code is run, which I imagine is useful for debugging problems at startup.

There’s also an interesting faulthandler.dump_traceback_later() function which seems to basically be a watchdog. It sets a timeout with a separate thread that results in a stack trace dump and, optionally, your program exiting. It calls _exit() which is the hard exit, which has pluses and minuses.

It’s sure a lot easier than the only other way I knew to do this, which was to attach gdb and inspect the interpreter’s state. But I wish it were enabled by default like Java’s old built-in behavior on SIGQUIT (Ctrl-\). Maybe they were afraid it was too radical a backwards-incompatible change.