Two kinds of debugging

I haven’t been blogging much lately, partly because I’m pretty head-down helping finish up the conversion of Gosu into Java bytecode (and otherwise trying to get it polished up and releasable), and partly because I tend to be afraid of wasting people’s time by stating the obvious. There’s certainly nothing that I know how to do that I haven’t learned from someone else or that isn’t obvious to everyone else attempting to solve the same problems, right? But the truth is that what seems obvious or derivative or natural to me might not be as obvious to someone else. So in the future I plan to worry a little less about stating the obvious, and instead I’ll try to write more about how I work and why I do things: perhaps it’ll be useless to someone “skilled in the art” (as one says when writing patent applications), but perhaps not.

Today’s installment of Stating The Obvious is around debugging. My categorization here is neither complete nor exhaustive, but I find it useful to think about debugging as often coming in two different forms, which I’ll term “debugging from first principles” and “debugging from differences.”

Debugging from first principles is what most people think of when they think of debugging; you walk through the code, either by statically reading through it or by debugging it as it runs, and reason about where it’s going wrong. foo() calls bar() which calls baz(), but bar() calls baz() with the wrong arguments because foo() passed bar() the wrong arguments, and that’s what the problem is. But how do you know that the call from foo() to bar() is the wrong one, and not the call from bar() to baz(), or not the behavior of baz() itself? You can only know that if you know what the code is supposed to do well enough to make those sorts of judgments. In other words, you have to start from first principles with a fairly involved understanding of each step in the chain: you have to understand what baz() is doing well enough to know that it’s behaving correct given its inputs, you have to understand bar() well enough to know that just passing through the arguments from foo() is the right thing to do, and you have to know foo() well enough to know that the invocation of bar() is simply a mistake rather than something intentional.

That method of debugging is pretty core to software development, but when a system is large and complicated it often breaks down pretty thoroughly. For example, if you’re trying to debug something like a complicated financial calculation (to pull a completely random example out of the air, suppose your code is computing the price for a sample auto insurance policy), it’s often not at all obvious what the “right” thing is. Some test is broken because the answer should be $702, and somehow it’s now coming out to $698. Why? Is there some round-off error somewhere? Is the test date-dependent somehow, such that changing the day on which it’s run changes the answer? Did someone change the sample rates? Is the sample policy now being built differently? Errors like that, which involve huge numbers of dependent moving parts, can be incredibly difficult to track down by first principles.

For those sorts of bugs, I prefer a technique I think of as “debugging by differences”. What that means is that you have some known good state of the code that you can compare the execution of against the broken version. Maybe that’s a past version in source control that you sync to from back when the test ran fine. In other cases, you might have a switch that controls whether execution flows down the old code path or the new one. Whatever the mechanism, if you have a test that can reproduce your bug, a known good state of the code under which that test passes, and the current state under which it fails, you can debug by instead tracing the execution in both modes and seeing where they differ. I personally tend to do this by putting copious amounts of print statements in my code that serve as a trace of execution along the paths that I care about; in the financials case, that includes printing out all sorts of intermediate states of the calculations and the data being processed. Then I compare the output of those print statements under the working and non-working scenarios in order to narrow down the differences.

Debugging that way tends to be much, much faster and to the point than trying to determine from first principles what’s going on in a very complicated system. It can be most effective, though, if you plan your coding and testing to allow you to take advantage of that technique. When TDD evangelists say they never use their debugger, what they really mean is that if you have a large suite of tests that you can run between any set of non-trivial changes, you’ll always be able to pinpoint where you went wrong because the bug should always have been introduced by whatever small amount of code you changed since your last passing test run. The real world is, naturally, a little bit messier than that, especially when you’re making large architectural changes or rewriting things; you might not be able to run tests for a while, and only after changing thousands of lines of code do you find out that something, somewhere, no longer works properly. To that end, for any large scale projects I tend to favor a build-along-side-and-replace approach, rather than a change-in-place approach. When changing anything relating to PolicyCenter’s financials, I would build out my new code path in parallel to the old one, then find a good choke point where I could comment out a call that lead down the old path and replace it with a call to the new path. Even after I thought I was done, I would still check the code in that way and wait a couple of days before cleaning things up, giving our test harness a chance to cycle (running our full test suite takes to long to do locally before checking in, and our test harness tests on different databases and operating systems) and verify the change. If anything broke, I could then comment the old code path back in, add in appropriate debugging code to trace what was going on, and quickly track down the problem. We adopted a similar approach with compiling Gosu down to bytecode, with a master switch allowing us to control which runtime we execute under; when tests would break only when running under bytecode, we could turn it off and look at the execution under the old runtime to see what behavior was different. Sometimes that approach means that I copy and paste large amounts of code temporarily, often creating entirely new copies of classes with names suffixed with “2” to differentiate them from the old ones, all so that I can leave the old code paths fully intact until all the bugs are shaken out of the new paths, at which point I then rip out the old code, rename classes, and otherwise clean things up.

It often requires some amount of forethought to be able to debug by differences, so it’s a technique that’s worth taking into account when initially starting on any changes that you know could lead to difficult to debug breaks.