Two kinds of debugging

I haven’t been blogging much lately, partly because I’m pretty head-down helping finish up the conversion of Gosu into Java bytecode (and otherwise trying to get it polished up and releasable), and partly because I tend to be afraid of wasting people’s time by stating the obvious. There’s certainly nothing that I know how to do that I haven’t learned from someone else or that isn’t obvious to everyone else attempting to solve the same problems, right? But the truth is that what seems obvious or derivative or natural to me might not be as obvious to someone else. So in the future I plan to worry a little less about stating the obvious, and instead I’ll try to write more about how I work and why I do things: perhaps it’ll be useless to someone “skilled in the art” (as one says when writing patent applications), but perhaps not.

Today’s installment of Stating The Obvious is around debugging. My categorization here is neither complete nor exhaustive, but I find it useful to think about debugging as often coming in two different forms, which I’ll term “debugging from first principles” and “debugging from differences.”

Debugging from first principles is what most people think of when they think of debugging; you walk through the code, either by statically reading through it or by debugging it as it runs, and reason about where it’s going wrong. foo() calls bar() which calls baz(), but bar() calls baz() with the wrong arguments because foo() passed bar() the wrong arguments, and that’s what the problem is. But how do you know that the call from foo() to bar() is the wrong one, and not the call from bar() to baz(), or not the behavior of baz() itself? You can only know that if you know what the code is supposed to do well enough to make those sorts of judgments. In other words, you have to start from first principles with a fairly involved understanding of each step in the chain: you have to understand what baz() is doing well enough to know that it’s behaving correct given its inputs, you have to understand bar() well enough to know that just passing through the arguments from foo() is the right thing to do, and you have to know foo() well enough to know that the invocation of bar() is simply a mistake rather than something intentional.

That method of debugging is pretty core to software development, but when a system is large and complicated it often breaks down pretty thoroughly. For example, if you’re trying to debug something like a complicated financial calculation (to pull a completely random example out of the air, suppose your code is computing the price for a sample auto insurance policy), it’s often not at all obvious what the “right” thing is. Some test is broken because the answer should be $702, and somehow it’s now coming out to $698. Why? Is there some round-off error somewhere? Is the test date-dependent somehow, such that changing the day on which it’s run changes the answer? Did someone change the sample rates? Is the sample policy now being built differently? Errors like that, which involve huge numbers of dependent moving parts, can be incredibly difficult to track down by first principles.

For those sorts of bugs, I prefer a technique I think of as “debugging by differences”. What that means is that you have some known good state of the code that you can compare the execution of against the broken version. Maybe that’s a past version in source control that you sync to from back when the test ran fine. In other cases, you might have a switch that controls whether execution flows down the old code path or the new one. Whatever the mechanism, if you have a test that can reproduce your bug, a known good state of the code under which that test passes, and the current state under which it fails, you can debug by instead tracing the execution in both modes and seeing where they differ. I personally tend to do this by putting copious amounts of print statements in my code that serve as a trace of execution along the paths that I care about; in the financials case, that includes printing out all sorts of intermediate states of the calculations and the data being processed. Then I compare the output of those print statements under the working and non-working scenarios in order to narrow down the differences.

Debugging that way tends to be much, much faster and to the point than trying to determine from first principles what’s going on in a very complicated system. It can be most effective, though, if you plan your coding and testing to allow you to take advantage of that technique. When TDD evangelists say they never use their debugger, what they really mean is that if you have a large suite of tests that you can run between any set of non-trivial changes, you’ll always be able to pinpoint where you went wrong because the bug should always have been introduced by whatever small amount of code you changed since your last passing test run. The real world is, naturally, a little bit messier than that, especially when you’re making large architectural changes or rewriting things; you might not be able to run tests for a while, and only after changing thousands of lines of code do you find out that something, somewhere, no longer works properly. To that end, for any large scale projects I tend to favor a build-along-side-and-replace approach, rather than a change-in-place approach. When changing anything relating to PolicyCenter’s financials, I would build out my new code path in parallel to the old one, then find a good choke point where I could comment out a call that lead down the old path and replace it with a call to the new path. Even after I thought I was done, I would still check the code in that way and wait a couple of days before cleaning things up, giving our test harness a chance to cycle (running our full test suite takes to long to do locally before checking in, and our test harness tests on different databases and operating systems) and verify the change. If anything broke, I could then comment the old code path back in, add in appropriate debugging code to trace what was going on, and quickly track down the problem. We adopted a similar approach with compiling Gosu down to bytecode, with a master switch allowing us to control which runtime we execute under; when tests would break only when running under bytecode, we could turn it off and look at the execution under the old runtime to see what behavior was different. Sometimes that approach means that I copy and paste large amounts of code temporarily, often creating entirely new copies of classes with names suffixed with “2” to differentiate them from the old ones, all so that I can leave the old code paths fully intact until all the bugs are shaken out of the new paths, at which point I then rip out the old code, rename classes, and otherwise clean things up.

It often requires some amount of forethought to be able to debug by differences, so it’s a technique that’s worth taking into account when initially starting on any changes that you know could lead to difficult to debug breaks.

2 Comments on “Two kinds of debugging”

  1. Rob says:

    I am not sure I entirely agree on this approach. Certainly I am fine with the method for debugging the code the approach of this is a personal preference. I sometimes find myself doing similar things to what you spoke about not just when debugging but even when fixing a problem on my PC. If you have a working copy to compare against it is always quicker to fix the problem.
    What I am not as comfortable with is accepting that you have to write code blindly without feedback telling you, you are on the right track. This is a trap. True AGILE development and TDD requires quick feedback so that you don’t write thousands of lines of code before realising you’re on the wrong path and absorbed an entire iteration. If the reason is it simply takes too long to run your full test suite locally than this is the problem so start debugging and troubleshooting this as it needs to have a solution. Perhaps run the tests in parallel, reduce the time it takes to start a container or remove it all together, employ the use of a grid for local builds to use. In the long run solving this problem will cost far less than the cumulative money and time lost because the wrong code is written.

    • Alan Keefer says:

      While I agree with what you say in the general case, there are always exceptions. We’ve got all sorts of test parallelization and gridding going on in our test harness, but we’ve got a few dozen branches that each have to run close to 100k tests all competing for machines, so even though we have several hundred test instances going at once it’s not enough for every developer to use every time they want to run tests.

      Making the tests run faster is a top priority of mine for exactly the reasons you mention, but we’ve made a tradeoff between completeness of our tests in terms of their replication of actual runtime behavior and test performance, and we’ve sacrificed test speed for completeness. The cumulative time to run all of our tests for all our products is probably something north of 20 hours of CPU time, not even counting the really slow tests that do things like test database upgrade; no matter how good my magic wand is, that’s not going to turn into 10 minutes.

      I’m not a fan of working without feedback for long periods of time either, which is why I try to stage things in such a way that I can check them in even if they’re not done, and potentially quickly turn those changes off if they turn out to be problematic. But again, it’s all going to depend on a lot of different factors, and it’s hard to make blanket statements about the best way to do things. Smaller changes with faster tests are easier; large, destabilizing changes to huge products with a ton of slower-running tests don’t have any easy answer.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s