Thoughts On Tech Debt

Martin Fowler recently updated his article on technical debt, and we’ve been discussing it in-house lately as well (though isn’t that always a conversation at any company with long-lived products?), so I’ve been thinking about it a lot lately.  

Personally, I think it’s perhaps the most difficult engineering concept for non-engineers to internalize, because most things in the real world just don’t work that way; hence, the necessity of some analogy to a more common real-world concept.  The core feature of development that lets technical debt happen is that the input of one “operation” is always the output of the previous one, meaning that mistakes and shortcuts build up over time, progressively dragging you down.  Everyone has infrastructure that subtly impacts everything they do (if you’re a chef and your pots are poor quality or your kitchen is poorly laid out, it makes everything harder), and reputation can always make your business suffer (if you’re a sales guy and you offend a potential client, that can hurt you long term), but there are very few disciplines where you have to continuously build something up over the course of years.  Even if you’re in construction, once you’re done with a given building you move on to the next one.  But code bases are expected to essentially live forever, meaning that mistakes made during the beginning add up over time.  That just doesn’t happen if you’re a chef, or a salesman, or a doctor, or an artist, or almost any other job you can think of.  For engineers, the concept comes naturally:  everyone understands that the decisions they make now will affect how they do their job in the months and years ahead.  But for someone that’s never experienced that, I think it’s just a very difficult concept to internalize.

That said, it’s an important concept to grasp.  To me, the important part about technical debt isn’t the principal, as it were:  it’s the interest.  Realistically, though, it doesn’t work like real interest.  Rather, it’s more like a tax:  the amount you pay isn’t fixed based on the size of the “debt,” but rather is generally proportional to how much work you want to do, and the size of the “debt” really determines the tax rate rather than some fixed amount of overhead.  (One might argue that there is, in fact, some fixed amount in the form of ongoing maintenance, so there’s probably an argument to be made that the tax analogy isn’t really accurate either.)  But either way you think about it, the important part of the concept isn’t just that there’s a backlog of stuff to fix, but rather that prior decisions that were made affect your ability to work productively in the future.

There are two things that I think are less obvious about how insidious technical debt is.  The first one is that incurring the debt sets expectations artificially high about how much work the team can do; if you incur a ton of debt in the first version of a product in order to get it out the door, you’ve set the expectation that the team can do X amount of work in a 12-month release, when in reality you could only do X/2 without incurring debt . . . and because of that debt, it’s now more like X/2.5.  The second insidious thing is that paying it down requires a huge resource commitment and delivers very little short term benefit.  It often seems like a total black hole; if you’re paying 10% yearly interest on a $100,000 loan, paying back $50,000 on top of the interest only saves you $5,000 a year.  So paying off the debt often seems like a poor investment, which means it just builds up and slowly exerts more and more of a tax on development, making it even harder to do something about it.

Of course, technical debt isn’t exactly measurable, and neither is productivity, but just for fun let’s pretend that we can and do a little  math anyway, since I think it’s an interesting exercise.  Imagine we’re measuring both productivity and debt in feature-dollars, and that we have a team that can do $100k worth of feature-dollars in a given year.  The first version of the product, however, needs to be ready in one year and have $200k worth of features in it.  So to get over the hump, the team borrows $100k at 10% APR.  Of course, the technical debt lenders are cut throat, and in reality it’s always harder to fix things than it would have been do them right in the first place; we can imagine that as if the tech debt lenders charged back-breaking fees, say 30%.  So after year one, we’ve got $200k worth of features and $130k worth of debt costing us $13k a year.

The team realizes that it overextended in the first release, but no one can quite swallow a 50% cut in productivity; the team did $200k the first time around, right?  So instead, they shoot for $120k, thinking that’s a much more reasonable target.  But their original rate minutes the $13k in debt means that to get $120k of features out, they need to incur another $33k of debt, which with fees we’ll round up to $40k.

By the time the third year rolls around, the project is $170k in debt, and the team decides to do something about it.  They decide to scale their dev effort back in half and only deliver $60k worth of features, so they pay $17k in interest on the debt, do $60k worth of feature work, and have $23k leftover for technical debt.  By sacrificing about 1/4 of their total dev capacity for the release (and more like 40% of their actual feature-building capacity), the team manages to reduce the debt down to $147k, saving them all of $2.3k per year in debt.  So next time around, they’re basically in exactly the same boat.

As the debt gets ever higher, there becomes an inflection point where the debt is high enough to nearly bring development to a total halt, and yet so large that nothing can be done about it.  Imagine if the team instead tried to deliver $200k of features in each release.  In the second release, they’re paying $13k in interest, so they have to take out $113k in loans to hit their target, adding maybe $150k after fees.  In the third release, they’re paying $28k in interest, so they have to take out $128k in loans, adding $160k in debt.  So by the fourth release, they’ve got $440k in debt; if they take on no further loans (and after some point you really can’t), their dev capacity will around half of what it should be.  But the debt is also so large that there’s realistically no way to pay it down; it would take five years of no further feature work.  So at that point, it’s basically checkmate for the product . . . either you limp along with a product that doesn’t really evolve anymore and hope that a competitor doesn’t blow by you while you’re standing still, or you try to rewrite the whole thing and hope that doesn’t completely kill the project (which is by far the most likely outcome of a total rewrite effort).

You can play through that scenario with different perceived interest rates, or thinking of the debt as a tax instead of a constant amount, and over longer periods of time, but hopefully it illustrates the problems that I mentioned above, both around artificially increasing expectations for the team, leading to yet more debt, leading to yet more pressure to cut corners, and around the fact that paying down the debt often requires a Herculean effort for very little payoff.  Make of that what you will.  As with real debt, there’s a time and a place to incur it:  sometimes it’s important to hit a deadline, or to get a feature in for a key client, and the interest and fees are worth the cost.  But in the long run, technical debt can’t be allowed to build up to the point where it’s both too large to pay down and too large to allow for future development work, which requires walking a fine line between incurring debt when it’s necessary to get things done fast enough and holding it off or paying it down so that it doesn’t get out of control.

7 Comments on “Thoughts On Tech Debt”

  1. Raoul Duke says:

    your $ example is precious. thank you for grounding the topic (even if it is a thought experiment and not to be taken too literally :-).

  2. Marcus Ryu says:

    A great post, Alan; it really brings the point home to us non-engineers. Your analogy is a bit harsh, because unlike the national debt, “secular” growth in GDP help mitigate some of the debt burden. A growing economy can absorb additional debt (proportionately) without decreasing the standard of living. is there an analogy in development?

    It would help non-engineers appreciate tech debt more if there were some way to quantify it. Economists manage to measure the cost to the economy for an inadequate infrastructure, so I presume there is a way. Even a qualitative swag to the effect of, “The release-after-next will have 30% fewer features in it if we don’t allocate 2 months of this release to paying down tech debt,” would drive the point home.

  3. Alan Keefer says:

    Well, as I mention in the article, it’s really closer to a tax rate than to an interest payment, which means it scales with what you’re trying to build and there’s nothing like GDP growth that can offset it. If your technical debt is at the point where the tax rate is 50%, say, all you can do is try to throw twice as many people at the problem; unsurprisingly, that’s what most companies seem to end up doing, and as I point out above it’s actually going to be a more effective tactic in the short term than paying the debt down will be. Of course, expanding the team has its own long-term costs.

    One problem with measuring technical debt is that measuring productivity itself is basically impossible; economists at least have the advantage that they tend to be looking at the impact on some vaguely measurable number like GDP. It’s also the case that every kind of project and every kind of debt is different, so there’s no way to model things based on empirical data from other sources. The best you can do is to pull numbers out of thin air, which obviously makes for a less-than-convincing argument. Every part of the number is debatable in terms of how much debt we have, how much time we need to pay it down, and how much of an impact it’ll have, and different people’s estimates will be off by an order of magnitude from each other.

    And again, the really nasty part, as I tried to point out, is that any sort of investment, even a seemingly-massive one, will tend to have a small near-term effect but a large near-term cost. So it’s more like, “If we don’t spend 30% of our dev capacity paying down our debt this release and commit to doing everything we can to avoid additional debt, then 5 years from now it’ll take us 3 times longer to do everything, and we’ll be in such a big hole that there’s nothing we can do about it. But between paying down the debt and avoiding new debt, we’ll only get half as much done this release as we did last release.” The costs under that scenario are so obvious and immediate while the benefits are so speculative and long-term that it becomes a difficult argument to make.

    I’m sure that we (and myself personally) could do a better job of trying to make those more concrete arguments, though, even if they are just complete SWAG guesses.

  4. Marcus Ryu says:

    It seems to me that this category of problem can only be addressed by having decision-makers — by which I mean the development team’s leadership — with a long enough time horizon (and a low enough discount factor) to care about development productivity and product quality multiple releases hence. The enterprise has to have the right values to trust the development leadership to make the appropriate trade-offs and to represent to his non-development colleagues the true amount of capacity for a given release, consistent with a sustainable product trajectory.

    The software business, of course, has been afflicted with too- short time horizons and too high discount factors. If the VPE only cares about investor-imposed metrics that apply during the vesting window for his stock options, there is no hope. I imagine that what happens equally often is that a well-intentioned development leader inherits debt-ridden code and faces the unsavory prospect of paying debt with no functional advance, or digging deeper into the hole.

    It’s the struggle of our species to keep a long-term orientation, isn’t it?

  5. Raoul Duke says:

    @”It’s the struggle of our species to keep a long-term orientation, isn’t it?”

    wow, how true and, er, depressing.

    oh well.

  6. Raoul Duke says:

    while re-reading this article i also had an optimistic thought: part of the reason we incur technical debt is that we aren’t smart enough to recognize all forms of it in the first place. another argument, i’d say, for at least two things: code review, and technical training/self-improvement.

  7. Adam McClure says:

    Great article on what I consider to be the biggest challenge for engineering organizations today. Basically it boils down to an argument about sustainability and maintaining a linear relationship between growth in customers (and by extension the feature set) and the cost to deliver and maintain the product. In my last two CTO roles I came into organizations that were already well over the cliff as far as technical debt and there was almost no practical way to get things back on track without working insane hours.

    The only way out (once you are in “debt”) that I can see is to put the existing team in maintenance mode and work pretty much only on the debt and bug fixes. If you still want to get something new out the door consider adding contract or offshore workers. However, that implies you’ve done at least enough process improvement to be able to properly absorb additional labor resources without compounding the debt through more of the same short-term practices. The staff augmentation model is a very popular way to get around this problem for large companies who have reached the point of saturation where 80%+ of their annual budgets goes to maintenance.

    In my opinion the real promise of Service-Oriented Architecture (SOA) is/was(?) to introduce not only web services integration technologies but to allow for more modular coupling of business process components as a way of facilitating process agility in the “Agile” sense and mitigate technology debt accumulation. What has happened, of course, is that the same folks who brought you the endless onslaught of tech debt are responsible for implementing its solution and generally it leads to more complexity with very little effort going to pay down the debt on a sustainable basis.

    I often talk about the three-legged stool of new features, existing maintenance (i.e. bug squashing), and overhead (i.e. debt). If you don’t allocate time to each area with each release cycle you will inevitably produce less and less with the same number of resources.

    I’ll even go so far as to chum the waters with flame bait by suggesting that the significant success of the offshore value proposition (“more bodies, less money”) is because we have evolved common engineering practices which nearly guarantee an accumulation of tech debt and require more bodies for more maintained features or applications.

    The solution for the F500 is to integrate the awareness of tech debt into their portfolio management and IT Governance disciplines. Product companies need to ensure the CEO understands the crippling impact tech debt can have on business growth. I speculate the current perspective will only change in a substantial and lasting way when we can figure out the appropriate financial models to anticipate the long-term cost of change based on upstream architectural and process decisions. If you’ve got those models sitting around in a spreadsheet for a common platform (PHP/Java/.NET) send them over!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s