Tracing Python execution

Marcos Dione

2016-06-23 20:32

Today I stumbled upon PyCon 2016's youtube channel and started watching some of the talks. The first one I really finished watching was Ned Batchelder's "Machete debugging", a very interesting talk about 4 strange bugs and the 4 strange techniques they used to find where those bugs were produced. It's a wonderful talk, full of ideas that, if you're a mere mortal developer like me, will probably blow your mind.

One of the techniques they use for one of the bugs is to actually write a trace function. A trace function in cpython context is a function that is called in several different points of execution of Python code. For more information see sys.settrace()'s documentation.

In my case I used tracing for something that I always liked about bash: that you can ask it to print every line that's being executed (even in functions and subprocesses!). I wanted something similar for ayrton, so I sat down to figure out how this would work.

The key to all this is the function I mention up there. The API seems simple enough at first sight, but it's a little more complicated. You give this function what is called the global trace function. This function will be called with three parameters: a frame, an event and a event-dependent arg. The event I'm interested in is line, which is called for each new line of code that is executed. The complication comes because what this global trace function should return is a local trace function. This function will be called with the same parameters as the global trace function. I would really like an explanation why this is so.

The job for this function, in ayrton's case, is simple: inspect the frame, extract the filename and line number and print that. At first this seems to mean that I should read the files by myself, but luckily there's another interesting standard module: linecache to the rescue. The only 'real complication' of ayrton's use is that it would not work if the script to run was passed with the -c|--script option, but (un)luckily the execution engine already has to read the hold the script in lines, so using that as the cache instead of linecache was easy.

Finally, if you're interested in the actual code, go take a look. Just take in account that ayrton has 3 levels of tracing: à la bash (script lines prepended by +), with line numbers, and tracing any Python line execution, including any modules you might use and their dependencies. And don't forget that it also has 3 levels of debug logging into files. See ayrton --help!