Taking Responsibility

I’ve been a bit hesitant to write something about this, since I think it’s something of a delicate subject . . . it’s just a bit odd to write about, knowing that people I work with will read it.  But it’s an important enough subject, and one that I feel strongly enough about, that I think it’s finally time to say something about it.

For those of you who are reading this and don’t know, my technical title is something like “PolicyCenter Architect.”  Since I think the word “architect” in a software context has taken on all sorts of horrible connotations (it conjures up images of people who draw boxes and arrows but don’t actually write any code themselves), I tend not to use it personally.  Instead, I tend to think of my role on the team as being the guy who should be blamed when things don’t work.  Not the guy who should be blamed when something he did doesn’t work, but just the guy to blame, period.

I know it’s something of an extreme stance to take, which is why I think it’s worth an explanation.  Coming at it from a different angle:  my job is to make sure that the project doesn’t fail for technical reasons.  If the code is buggy, or it doesn’t perform, or it’s not flexible enough or in the right ways, then I’ve failed to do my job.  It doesn’t really matter why, or who wrote the code:  I’m supposed to make sure that doesn’t happen, and to do whatever it takes to get there.  In some ways that’s fundamentally unfair, since I can’t possibly control everything that goes on, yet I’m personally responsible for it anyway.  I honestly think that’s still how it has to be.

Taking responsibility for things is a powerful thing, especially when they’re things that you don’t directly control.  Part of being a professional is certainly taking responsibility for your own work, and not passing the blame to someone else when things go wrong.  But I think it’s even more powerful to simply say I will not let this fail.  Full stop.  But I think the only way to really get to that mindset is to take responsibility for that failure; otherwise, there will always be the temptation to throw up your hands and pass the blame on to someone else, and at the moment you absolve yourself of responsibility for that failure you’re lessening your personal drive to do something about it.  The other thing that needs to happen is that you have to accept the fact that things could fail, that you have limited control over that, and if they do it’ll be your fault anyway.  Failure sucks, and it’s a hard thing to swallow:  on my current project, it took me about five months to really come to peace with the fact that I was going to have to make a lot of very, very hard yet speculative decisions and that my best efforts could still end in failure (and a very, very costly failure at that).  Once I finally did, though, I slept a lot easier, and it freed me up to do better work than I otherwise would have been able to do.  Whatever role I managed to play in making the project succeed, I think it was significantly aided by the fact that I got over being afraid of screwing up.

I think there are three primary reasons why taking responsibility and blame for things you don’t actually have control over works out.  First of all, it forces you to do whatever you can to minimize that chance of failure.  Since you don’t have complete control, you can’t get it down to zero, but you sure have an incentive to get it there.  And not only that, but you have an incentive to do what’s right for the project as a whole, not just for your part of it or for you personally.  That extra drive, extra motivation, and whole-project orientation will lead to you doing more to make the project succeed than you would otherwise.  Secondly, it lets you make hard decisions under uncertain conditions.  A lot of projects fail simply because no one is willing to make those decisions; it’s always less personally risky to leave things as they are, or to take the well-trod route, or to pass the decision off to someone else even if they’re not in the best position to make the decision.  Sometimes, you’re the best person to make the decision, and you just have to be willing to do it.  Sometimes you’re not, so you have to pass the analysis off to someone else, but in order for them to really make the best decision they can they have to know that you’re going to back them up and not blame them if it goes wrong for some reason.  Lastly, while I really have no idea what anyone who works with me really thinks, I like to think that having someone willing to take that level of responsibility lets everyone else on the team do better work too, since they know that someone has their back, they can focus on doing their best instead of worrying about trying to look good, and they have the freedom to do what they think is the right thing.

It’s worth mentioning that the flip side of taking the blame is not taking the credit.  I’m not the first one to point out that good leaders should accept blame and deflect responsibility, but I think it’s one of those things that it’s hard to repeat enough.  Never, ever take credit for other people’s work (even if you helped out), and when in doubt err on the side of taking too little (preferably way too little) credit.

I Am Hate Method Overloading (And So Can You!)

My hatred of method overloading has become a running joke at Guidewire. My hatred is genuine, icy hot, and unquenchable. Let me explain why.

First Principals
First of all, just think about naming in the abstract. Things should have good names. A good name is unique and easy to understand. If you have method overloading, the name of a method is no longer unique. Instead, the real name of the method is the sane, human chosen name, plus the fully qualified name of each argument’s type. Doesn’t that just seem sort of insane? If you are writing a tool that needs to refer to methods, or if you are just trying to look up a method reflectively, you have to know the name, plus all the argument types. And you have to know this even if the method isn’t overloaded: you pay the price for this feature even when it isn’t used.

Maybe that strikes you as a bit philosophical. People use method overloading in java, so there must be some uses for it. I’ll grant that, but there are better tools to address those problems.

In the code I work with day to day, I see method overloading primarily used in two situations:

Telescoping Methods
You may have a function that takes some number of arguments. The last few arguments may not be all that important, and most users would be annoyed in having to figure out what to pass into them. So you create a few more methods with the same name and fewer arguments, which call through to the “master” method. I’ve seen cases where we have five different versions of a method with varying numbers of arguments.

So how do I propose people deal with this situation without overloading? It turns out to be a solved problem: default arguments. We are (probably) going to support these in the Diamond release of GScript:

  function emailSomeone( address:String, subject:String, body:String,
                         cc:String=null, logToServer:boolean=false,
                         html:boolean = false ) {
    // a very well done implementation
  }

A much cleaner solution. One method, with obvious syntax and, if your IDE is any good, it will let you know what the default values of the option arguments are (unlike method overloading.)

True Overloading
Sometimes you truly want a method to take two different types. A good example of this is the XMLNode.parse() method, which can take a String or a File or an InputStream.

I actually would probably argue with you on this one. I don’t think three separate parse methods named parseString(), parseFile() and parseInputStream() would be a bad thing. Code completion is going to make it obvious which one to pick and, really, picking a unique name isn’t going to kill you.

But fine, you insist that I’m a terrible API designer and you *must* have one method. OK, then use a union type (also probably available in the Diamond release of GScript):

  function parse( src:(String|File|IOStream) ) : XMLNode {
    if( src typeis String ) {
      // parse the string
    }
    ...
  }

A union type lets you say “this argument is this type or that type.” It’s then up to you to distinguish between them at runtime.

You will probably object that this syntax is moderately annoying, but I’d counter that it will end up being fewer lines of code than if you used method overloading and that, if you really want a single function to handle three different types, you should deal with the consequences. If it bothers you too much, just pick unique names for the methods!

So?
Let’s say you accept my alternatives to the above uses of method overloading. You might still wonder why I hate it. After all, it’s just a feature and a pretty common one at that. Why throw it out?

To understand why I’d like to throw it out, you have to understand a bit about how the GScript parser works. As you probably know, GScript makes heavy use of type inference to help developers avoid the boilerplate you find in most statically typed languages.

For example, you might have the following code:

  var lstOfNums = {1, 2, 3}
  var total = 0
  lstOfNums.each( \ i -> { total = total + i  } )

In the code above, we are passing a block into the each() method on List, and using it to sum up all the numbers in the list. ‘i‘ is the parameter to the block, and we infer it’s type to ‘int’ based on the type of the list.

This sort of inference is very useful, and it takes advantage of context sensitive parsing: we can parse the block expression because we know the type of argument that each() expects.

Now, it turns out that method overloading makes this context sensitive parsing difficult because it means that when you are parsing an expression there is no guarantee that there is a single context type. You have to accept that there may be multiple types in context when parsing any expression.

Let me explain that a bit more. Say you have two methods:

  function foo( i : int ) {
  }

  function foo( i : String ) {
  }

and you are attempting to parse this expression:

  foo( someVar )

What type can we infer that the context type is when we parse the expression someVar? Well, there isn’t any single context type. It might be an int or it might be a String. That isn’t a big deal here, but it becomes a big deal if the methods took blocks, or enums or any other place where GScript does context type sensitive parsing. You end up having lists of context types rather than a single context type in all of your expression parsing code. Ugly.

Furthermore, when you have method overloading, you have to score method invocations. If there is more than one version of a method, and you are parsing a method invocation, you don’t know which version of the method you are calling until after you’ve parsed all the arguments. So you’ve got to run through all the argument types and see which one is the “best” match. This ends up being some really complicated code.

Complexity Kills

Bitch, bitch, moan, moan. Just make it work, you say. If the java guys can do it, why can’t you? Well, we have made it work (for the most part.) But there’s a real price we pay for it.

I’m a Berkeley, worse-is-better sort of guy. I think that simplicity of design is the most important thing. I can’t tell you how much more complicated method overloading makes the implementation of the GScript parser. Parsing expressions, parsing arguments, assignability testing, etc. It bleeds throughout the entire parser, its little tentacles of complexity touching places you would never expect. If you come across a particularly nasty part of the parser, it’s a good bet that it’s there either because of method overloading or, at least, is made more complicated by it.

Oh, man up! you say. That’s the parser’s and tool developer’s problem, not yours.

Nope. It’s your problem too. Like Josh Bloch says, projects have a complexity budget. When we blow a big chunk of that budget on an idiotic feature like method overloading, that means we can’t spend it on other, better stuff.

Unfortunately, because GScript is Java compatible, we simply can’t remove support for method overloading. If we could though, GScript would have other, better features and, more importantly, fewer bugs.

That is why I am hate method overloading. And so can you.

It can always be better

George Bernard Shaw put it best when he remarked that all progress depends on the unreasonable man.  Progress in just about any area requires that you not be content with the status quo, since otherwise you wouldn’t bother trying to make it better.  But at the same time, constant discontent with the world is a rough way to live; if nothing is ever good enough, you’ll always be unhappy, and if you’re always going to be unhappy what exactly is the point of working so hard to make things better?  On a pretty fundamental level, isn’t that drive to make progress and improve things the result of a desire to turn some discontentment with the current state of affairs into contentment?  But once you reach that contented state, how do you keep going?  Shaw could just as easily have said that all progress depends on the discontented man, but who wants to spend their live discontented?

Dealing with that requires finding a balance between contentment and complacency on one hand and discontentment and action on the other hand.  Logically, they’re kind of two ends of one scale; they’re P and not-P.  You can only have one, right?  Thankfully (or not, depending on your perspective), the human brain isn’t actually constrained by the rules of logic.  Logic tells us “P or not-P” but your brain says, “Why can’t I have both again?”  Paradoxes are a part of life, and they’re not just the result of lazy thinking; as much as the analytic philosopher/computer scientist in me wants the world to be a rational, P or not-P sort of place, it’s not, and we can find ways to use that to our advantage.  

My solution is to take that one single axis of “am I happy with X” and treat it as if it’s actually two independent axes; I call them “contentment” and “satisfaction” for lack of better terms, such that I can be content while still not being satisfied with things.  If I’m running, I can simultaneously be contented that I ran what for me is a good time while still being dissatisfied that I didn’t set a new personal best.  And if I’m working, I can simultaneously be extremely happy with how much I’ve been able to do while being appalled that I haven’t done more, and I can be content with the state of our code and our tools and how advanced they are compared to what our competitors try to work with while still feeling like I won’t actually be happy until they’ve come miles further than where they are now.

This is just my opinion, of course, but I think a lot of people tend to fall into just one sort of thinking or the other:  either they’re happy enough with things that it reduces their drive to improve them, or they’re driven to improve things but they’re unhappy (and in many cases, that can lead to feelings of futility, which eventually just kills motivation altogether).  I’m not sure I really have the balance mastered myself, but I’m pretty convinced that it’s critical to doing the best work that I can do and avoiding complacency on one hand and burnout on the other.

There’s a more sinister form of the discontentment that afflicts software engineers as well, and that’s the gold-plating/no shipping afflication that leads to never deciding something is good enough to move on.  So if you’re an engineer, that’s yet another potential affliction to master; you need to find a way to be content and enjoy your work while not getting burned out, and to constantly improve everything while still being able to ship code and move on to new projects.

If there’s one thing I look for in engineering candidates these days, it’s probably that constant drive to improve things, and that recognition that things can always be better.  Because they can.  There’s no end-point in the game we’re playing (yes, even if you’re a Lisp guy, the answer is still out there); the language, the tools, the libraries, the techniques, the tests, the code, the process . . . no part of the stack is ever truly done.  To me one of the joys and truly unique things about software development is the fact that there is always progress to be made, and that every engineer has the opportunity to make that kind of progress.  I think that constant drive is something that all truly great engineers share.  Be unreasonable.

But even better than that constant drive to make things better is the ability to keep that drive alive over months and years and decades while staying happy, enjoying the work, avoiding burnout, and being able to priortize that against the need to ship and the need to move on to other projects.

Swoopers and Bashers and Writers and Programmers

It’s funny that Bruce Eckel posted on Artima yesterday about how programming is analogous to writing.  I’ve thought that for a long time, and Eric Roberts, who developed the introductory CS curriculum at Stanford (and is just an amazing teacher in general) liked to point out to incoming freshmen that success in those classes was more closely correlated with SAT verbal test scores than with SAT math scores.  But what I’d been meaning to write about was a bit of a tangent to that:  it wasn’t about how programming is like writing, but rather about how people have wildly different writing styles, which I think everyone accepts (no one expects Dan Brown to compose a novel the same way as Salmon Rushdie or Haruki Murakami), yet people often expect that there’s One True Way to write quality software.  Method X works for them, so if it doesn’t work as well for you, it must be because you’re doing something wrong.

But the truth of the matter is that programmers program differently, just like writers write differently.  One of my favorite discussions of writing comes from Kurt Vonnegut.  In his novel Timequake, he says there are two types of writers:

Tellers of stories with ink on paper, not that they matter any more, have been either swoopers or bashers. Swoopers write a story quickly, higgledy-piggledy, crinkum-crankum, any which way. Then they go over it again painstakingly, fixing everything that is just plain awful or doesn’t work. Bashers go one sentence at a time, getting it exactly right before they go on to the next one. When they’re done they’re done.

It’s an over-generalization, but as over-generalizations go it’s a pretty brilliant one.  Personally I fall into the swooper category; in fact, I’m even more extreme in that I’ll often rewrite from whole cloth large sections or even entire essays (or e-mails, or term papers).  I’ve found that I need to get something down in some roughly-complete form simply in order to collect my thoughts, so I simply couldn’t bash out something perfect one sentence at a time no matter how much I wanted to.  When English teachers in grade school made us write outlines for an essay, I’d write my outline after I was done with the paper, because that was the only way the two would line up.

Interestingly (to me, at least), I’ve found that I’m most productive when I program the same way.  For a long time, I felt kind of bad that I didn’t enjoy doing rigorous test-driven development; it seemed like TDD was some kind of ideal of well-constructed code and if I was just a little more disciplined, I’d fully TDD everything and my code would be flawless as a result.  But in reality, that style is just sub-optimal for me.  Just like when I write, I need to sketch something out in order to really see where I’m going.  So my development methodology these days is often to start with a brutal, hacked-up end-to-end spike of a feature, write some end-to-end tests, and then start building it sideways, back-filling more targeted tests as I go and always keeping it close to some known-good state.  If I try to throw that spike away and start over with TDD, or if I don’t do that spike at all, I go slower and produce code that’s harder to read or modify later.

My point is that there is no one right way to develop; what works well for one person won’t work so well for another person.  Most developers, I think, would admit that different methodologies and techniques and tools are appropriate for different problem spaces; one-person throwaway projects are obviously a different deal from 100-person projects where the code needs to live for decades.  But even within the same problem space, people are just different, and one programming methodology or style doesn’t fit all.

As a developer, it’s your responsibility to figure out what works for you, and that requires some experimentation and often the willingness to try out something that feels horribly awkward and unfamiliar at first.  Unfortunately, it’s often hard to know if you’re giving up too early or if something really just isn’t going to work for you.  So by all means, read all you can about TDD, BDD, pair programming, rapid prototyping, getting real, modeling, or just go out there and start hacking.  Try a method out, ask people who like it, take what works and leave what doesn’t, and find the style that leads to you doing your best work.

And if you ever happen to find yourself in a position of authority, one of the worst things you can do is require everyone to try to program the same way just because it’s the way that works for you or that works for some guy in a book you read.  Give people the freedom to do their best work and, surprise surprise, they will.

Finishing Refactorings

Technical debt, we all know, is hard to manage.  To fight against it, you have to (among other things) refactor and improve your code mercilessly.  But along the way, your attempts to make the code base a safer place can actually make them worse:  if you add in a new way to do something without actually removing the old way, you could end up with a code base that’s more cluttered and more inconsistent, making it even harder to understand than if you’d just never tried in the first place.

To take a contrived example, imagine that you’re writing a new test that needs to create a sample hierarchy of widgets and gizmos for your application, and you realize your existing architecture could use some improvements.  Over time you’ve accumulated a bunch of random helper methods that do various things, like WidgetUtil.createSampleWidgetWithOneGizmo(), but the methods therein are brittle, take too many arbitrary parameters, and are difficult to combine to create richer test data.  So, you decide to refactor the test utilities using the builder pattern to make things more fully-parameterizable and chainable.  Great idea, right?  So you create your nifty new WidgetBuilder and GizmoBuilder classes, use them to write your tests, and as predicted they make the data setup a lot clearer and more flexible.

The question, then, is what do you do next?  Do you just use them in for new tests, or do you go back and refactor the 158 existing tests that use WidgetUtil so that they use your new builder classes?  And do you stop there, or do you attempt to kill off all of your old data creation methods in favor of new builder patterns for everything?

When trying to improve the code, the real work is often not in the initial improvement itself, but rather in fully replacing the old way of doing something with the new, improved way.  Either option has some serious potential downside.  On the one hand, forcing the change through every part of the system is usually hugely labor intensive  and carries a high risk of breaking something that was already working, all for no immediate return whatsoever.  On the other hand, leaving things as is just adds to the complexity of the system:  now there are two ways to do X instead of just one, and every subsequent developer needs to understand both of those and know what the differences are.

So what’s the right thing to do?  Well, as with most programming tasks, it comes down to a judgment call about whether to attempt to push it through the system, whether to just add the new change but not attempt to refactor further, or whether to abandon the change and just do things the old way to avoid adding complexity to the system.  Here are a few questions to ask yourself:

  • How localized is the change?  Is it likely that most other developers will need to be aware of both ways to do things, or will only a smaller subset have to deal with it?
  • How bad is the status quo?  Is the improvement drastic, or merely incremental?
  • How much work is the refactoring?  Are there five other places where a similar pattern is present or 500?
  • How likely is the refactoring to break things?  Is it a fairly straightforward drop-in replacement or change, or is it something more involved that could turn out to have unanticipated interactions?
  • Can automated tools do the refactoring, or is it something that has to be done by hand?
  • How long is this system going to be around for?  Is this something relatively throwaway (though it always seems like every program lives longer than its creators expect), or is it something you know you’ll be dealing with five or ten years from now?
  • How likely is the refactoring to uncover and fix latent bugs or to otherwise clean up buggy areas in the system?

In general, in my experience most people tend to err too far on the side of not finishing things off and truly eliminating all vestiges of an old way to do something, be it in the form of a method or merely a general approach to a problem.  Especially once a system gets to a certain size, the cost starts to seem prohibitive.  But unfortunately, those are often exactly the times that it’s important to keep the code clean and avoid technical debt; technical debt is much more of a killer on large projects than on small ones.

So next time you come up with a new way to do something, or decide that way X is better than way Y, ask yourself if you’re stopping too early or if you should really be following the refactoring through to the end.

Thoughts On Tech Debt

Martin Fowler recently updated his article on technical debt, and we’ve been discussing it in-house lately as well (though isn’t that always a conversation at any company with long-lived products?), so I’ve been thinking about it a lot lately.  

Personally, I think it’s perhaps the most difficult engineering concept for non-engineers to internalize, because most things in the real world just don’t work that way; hence, the necessity of some analogy to a more common real-world concept.  The core feature of development that lets technical debt happen is that the input of one “operation” is always the output of the previous one, meaning that mistakes and shortcuts build up over time, progressively dragging you down.  Everyone has infrastructure that subtly impacts everything they do (if you’re a chef and your pots are poor quality or your kitchen is poorly laid out, it makes everything harder), and reputation can always make your business suffer (if you’re a sales guy and you offend a potential client, that can hurt you long term), but there are very few disciplines where you have to continuously build something up over the course of years.  Even if you’re in construction, once you’re done with a given building you move on to the next one.  But code bases are expected to essentially live forever, meaning that mistakes made during the beginning add up over time.  That just doesn’t happen if you’re a chef, or a salesman, or a doctor, or an artist, or almost any other job you can think of.  For engineers, the concept comes naturally:  everyone understands that the decisions they make now will affect how they do their job in the months and years ahead.  But for someone that’s never experienced that, I think it’s just a very difficult concept to internalize.

That said, it’s an important concept to grasp.  To me, the important part about technical debt isn’t the principal, as it were:  it’s the interest.  Realistically, though, it doesn’t work like real interest.  Rather, it’s more like a tax:  the amount you pay isn’t fixed based on the size of the “debt,” but rather is generally proportional to how much work you want to do, and the size of the “debt” really determines the tax rate rather than some fixed amount of overhead.  (One might argue that there is, in fact, some fixed amount in the form of ongoing maintenance, so there’s probably an argument to be made that the tax analogy isn’t really accurate either.)  But either way you think about it, the important part of the concept isn’t just that there’s a backlog of stuff to fix, but rather that prior decisions that were made affect your ability to work productively in the future.

There are two things that I think are less obvious about how insidious technical debt is.  The first one is that incurring the debt sets expectations artificially high about how much work the team can do; if you incur a ton of debt in the first version of a product in order to get it out the door, you’ve set the expectation that the team can do X amount of work in a 12-month release, when in reality you could only do X/2 without incurring debt . . . and because of that debt, it’s now more like X/2.5.  The second insidious thing is that paying it down requires a huge resource commitment and delivers very little short term benefit.  It often seems like a total black hole; if you’re paying 10% yearly interest on a $100,000 loan, paying back $50,000 on top of the interest only saves you $5,000 a year.  So paying off the debt often seems like a poor investment, which means it just builds up and slowly exerts more and more of a tax on development, making it even harder to do something about it.

Of course, technical debt isn’t exactly measurable, and neither is productivity, but just for fun let’s pretend that we can and do a little  math anyway, since I think it’s an interesting exercise.  Imagine we’re measuring both productivity and debt in feature-dollars, and that we have a team that can do $100k worth of feature-dollars in a given year.  The first version of the product, however, needs to be ready in one year and have $200k worth of features in it.  So to get over the hump, the team borrows $100k at 10% APR.  Of course, the technical debt lenders are cut throat, and in reality it’s always harder to fix things than it would have been do them right in the first place; we can imagine that as if the tech debt lenders charged back-breaking fees, say 30%.  So after year one, we’ve got $200k worth of features and $130k worth of debt costing us $13k a year.

The team realizes that it overextended in the first release, but no one can quite swallow a 50% cut in productivity; the team did $200k the first time around, right?  So instead, they shoot for $120k, thinking that’s a much more reasonable target.  But their original rate minutes the $13k in debt means that to get $120k of features out, they need to incur another $33k of debt, which with fees we’ll round up to $40k.

By the time the third year rolls around, the project is $170k in debt, and the team decides to do something about it.  They decide to scale their dev effort back in half and only deliver $60k worth of features, so they pay $17k in interest on the debt, do $60k worth of feature work, and have $23k leftover for technical debt.  By sacrificing about 1/4 of their total dev capacity for the release (and more like 40% of their actual feature-building capacity), the team manages to reduce the debt down to $147k, saving them all of $2.3k per year in debt.  So next time around, they’re basically in exactly the same boat.

As the debt gets ever higher, there becomes an inflection point where the debt is high enough to nearly bring development to a total halt, and yet so large that nothing can be done about it.  Imagine if the team instead tried to deliver $200k of features in each release.  In the second release, they’re paying $13k in interest, so they have to take out $113k in loans to hit their target, adding maybe $150k after fees.  In the third release, they’re paying $28k in interest, so they have to take out $128k in loans, adding $160k in debt.  So by the fourth release, they’ve got $440k in debt; if they take on no further loans (and after some point you really can’t), their dev capacity will around half of what it should be.  But the debt is also so large that there’s realistically no way to pay it down; it would take five years of no further feature work.  So at that point, it’s basically checkmate for the product . . . either you limp along with a product that doesn’t really evolve anymore and hope that a competitor doesn’t blow by you while you’re standing still, or you try to rewrite the whole thing and hope that doesn’t completely kill the project (which is by far the most likely outcome of a total rewrite effort).

You can play through that scenario with different perceived interest rates, or thinking of the debt as a tax instead of a constant amount, and over longer periods of time, but hopefully it illustrates the problems that I mentioned above, both around artificially increasing expectations for the team, leading to yet more debt, leading to yet more pressure to cut corners, and around the fact that paying down the debt often requires a Herculean effort for very little payoff.  Make of that what you will.  As with real debt, there’s a time and a place to incur it:  sometimes it’s important to hit a deadline, or to get a feature in for a key client, and the interest and fees are worth the cost.  But in the long run, technical debt can’t be allowed to build up to the point where it’s both too large to pay down and too large to allow for future development work, which requires walking a fine line between incurring debt when it’s necessary to get things done fast enough and holding it off or paying it down so that it doesn’t get out of control.

Phoenix First Two XP Sprints

Phoenix has finished its Sprint 5 and 6, which are the first two XP Sprints.

An XP team always goes through four stages, “forming, storming, norming, performing”. The first Sprint felt like a storming stage, where we are trying to figure out the best way to get the code in without spending too much time on upfront design. At the same time, we are also getting used to paired programming.

Even though paired-programming has become an old trick for me, I still feel that my pairing skill has gotten worse during the past three years of working solo. The second Sprint felt a lot better, and I am hoping to keep this trend.

Items that worth noting:

  • We modified the lava lamp to have a green light on when everything is good. Even though it is redundant, it has very positive effect among us. The only thing we might need to watch out is that someone mentioned that they could be fire hazard because the lamp gets very hot at the end of the day. So we are going to turn them off by the end of the day. This is when I found out that the X10 remote controller does not work, so they are back for replacement now.
  • The lava lamps are helping us getting on the habit of treating broken tests as the highest priority. Due to the nature of Phoenix, we got some interesting test breakage already. We got tests that only break on the server, tests that only break on Linux, and a test that hung. One interesting discovery is that each time we are forced to figure out what is wrong and fix them, our tests ended up making better sense and being more like behavior driven, and I was planning on settling for hacks to keep the test passing!
  • At the beginning of the project, we chose to create just enough stories to get us through the first Sprint, then created a few more for the second Sprint. Looking back, I think that is a good choice. The kind of stories that we create now are so much different but better from the earlier ones. I think that is because at the beginning, your system has literally nothing. It would take a very good story writer to come up with a list stories that really fit into the “INVEST” category of the story. I am not saying it is impossible, I just think that two Sprints of bad stories is not a bad price to pay to get the ball rolling as early as possible and avoid lots of hassle to learn and teach and debate about good stories vs bad stories.

Checking in on Process Changes

As I blogged about a while back, for this release cycle of the PolicyCenter product we’ve made a number of changes to our development process to try make things work more smoothly and predictably.  We re-organized the team into cross-functional sub-teams (which we call “pods”), moved from four-week sprints to two-week sprints, started scheduling and estimating work based on story cards with assigned points instead of estimating in days, and put more of an emphasis on being “done done” with features before moving on to anything else.  We’ve been at it for 8 full sprints now, which is long enough to get a pretty good read on how it’s worked out.

In something of a pleasant surprise, given the totally unpredictable nature of software development where hardly anything ever works like you’d expect, the changes have actually worked out very well.  While it’s too early to tell how “done done” we’re really getting (we’ll obviously find out more as we try to close out the release), the product is certainly more stable and more complete early in the release than it’s ever been before.  The biggest benefit though, by a wide margin in my estimation, has been the change to break out all our work into small (1-5 day) stories that get written up and estimated with the entire team.  While the estimation meetings that we do that in can seem fairly tedious and slow-moving (because they are), the time spent in them has proven absolutely invaluable.  Working off of stories instead of PRDs has helped keep the development team moving by making sure there’s always a steady supply of ready-to-work on items (and we can tell when that supply is low), and doing the meetings as a pod makes sure that everyone is on the same page, or at least much closer to it than happened before.

So suppose you want to try this out on your own team.  What exactly do you need to do?  The INVEST model (http://xp123.com/xplor/xp0308/index.shtml) is a great starting point for thinking about what makes for a good story, though personally I’m not yet sold on the whole “incremental architecture” thing, so I’m less strict about avoiding infrastructure-only stories.  Where do the stories come from?  On our team, the stories are worked out collaboratively during a weekly meeting we call an “estimation session,” where whoever is driving a feature (generally the product manager, but theoretically the developers themselves for dev-driven features) presents what they’d like done and has help from the rest of the group turning that into bite-sized cunks.  The stories are then estimated using “planning poker,” where each developer has a set of cards with estimate numbers on them (we use 0, 1/2, 1, 2, 3, 5, 8, and ?) and independently chooses the estimate they think is appropriate; everyone hides their chosen card until everyone else is done, to avoid any sort of groupthink, and everyone shows at once.  If there are discrepencies, we talk them out (other teams re-vote until the vote settles on a number, but my group has been less strict about that and tends to just come to a group decision quickly, since none of us think it’s worth a huge amount of time debating if something is really a 2 or a 3).  The estimates are done in “points” which, at the start, you can think of as “ideal developer days,” but which ideally just become a relativized estimate (3 point cards should take roughly 3 times as long as a 1 point card) that then allows you to empirically measure how many points you do per sprint.

The meetings tend to run the smoothest when the driver for the feature has at least an idea of how things break down into stories already; the rest of the team can then question that, suggest alternative splits or combinations, etc., but having that starting point helps keep things moving.  The important thing that happens during the meeting, though, is really that people ask questions.  Lots and lots of questions.  Does this need to work in cases X and Y or only X?  Do we have any test infrastructure for this sort of thing?  When you say you want feature “foo,” do you mean A or do you mean B?  Where does the field labeled “Total” on the mockup come from and how is it defined?  Does this need to be configurable by customers or can we hardcode the logic? Etc.  Everyone on the team (or, in our case, pod) is at the meeting, so everyone comes away with approximately the same understanding of what needs to be done.  Perhaps more importantly, the questioning serves to identify stories and features that aren’t really ready yet for development:  either no one understands the issue well enough to estimate it (so we need to do some research), or the feature owner can’t answer the questions in enough detail yet, or the estimates turn out to be much higher than the PM expected and they decide to re-think the feature and de-scope it in some way.  To me, it’s really the estimation part that drives all of that:  in order to really accurately estimate a small chunk of work, you need to really understand what it is, and if you didn’t have to give that estimate you’d be much less vigilant about trying to understand the feature.

So out of all the different agile practices, story-based workflow might well be my favorite at this point.

Lava Lamp with CruiseControl

As we are getting Phoenix project under way, I am trying to get it started right by introducing more XP practices. The first three things that we are trying to do are Paired-Programming, Test-Driven Development, Continuous Integration.  This blog, is about Continuous Integration.

Actually, Guidewire has already built an internal tool, ToolsHarness, to handle continuous integration, as I have written in “Managing Tests with ToolsHarness, Individually“. The only difference that I want to introduce for Phoenix project is to fix broken tests AS SOON AS POSSIBLE.

What this means is that I want the testing status of our branch to show right in out faces, without us having to launch a browser, so that we know to take action the moment a test is broken.

I talked to the developer who manages ToolsHarness, and he wrote a servlet that serves information about broken tests and test status like this picture, except in one HTTP GET. Then I set up CruiseControl(version 2.8.2) with X10 publisher, following the setup described on this blog post “Bubble, Bubble, Build’s In Trouble“.

One thing about the normal lava lamp setup has always bugged me in the past, which is when the continuous integration server is in the “testing” state. When you have test broken, the red lava lamp will be on, and you just have to remind yourself that the fix is in and test is running. In some projects, I have used “project soundscape“, so that when tests finishe but are still broken, you will know about it. But if you happen to step outside, you will miss it. Or if you just came in, you have to check the browser or ask others.

So this time, I have done it a little differently, taking advantage of the fact that CruiseControl is not the process running the tests. I bought two lava lamp, one kind of in the red color and the other in blue. I set it up so that when there are two independent lava lamps:

  • Red Lava Lamp for broken tests: When there are broken tests, it will be on, otherwise, it will be off
  • Blue Lava Lamp for testing status: When there are tests running, it will be on, otherwise it will be off

In this way, you have four state to display:

  • Neither is on: All tests pass and the tests are up-to-date
  • Blue is on and red is off: All tests pass so far, but there are tests running against newer changes
  • Blue is off and red is on (see below): You have broken tests, and no code checked in to fix it
  • Both blue and red are on (see below): You have broken tests and someone has cheked in new code (hopefully to fix it)


The setup is pretty straightforward, except CruiseControl 2.8.2 release is missing two crucial files, “lib/win32com.dll” and “lib/javax.comm.properties”, for X10 publisher to work. That, and me missing a tiny but also crucial detail in the documentation, caused my three-hour-hair-pulling experience, and that was with Jeffrey coming to rescue through GTalk. I am going to submit the patch for the release script to include those two files, and documentation with the following checklist:

  • You should provide all FOUR attributes related to X10 for the element, so that you are aware of them and make sure they are correct. These four attributes are as following:
    • “houseCode” and “deviceCode” are for X10 module configuration.
    • “port”, with the value of COM1, COM2, etc., to match the place you plugin the COM module.
    • The last one is “interfaceModel”, which you should really double check with the COM module that you have.
  • Make sure “javax.comm.properties” is in your CruiseControl lib directory (should be there after 2.8.3)
  • Make sure you copy “win32com.dll” from CruiseControl lib directory (should be there after 2.8.3) to your Java bin directory

In the end, I would like to say that I am a satisfied ci-guys customer!

Pair-programming

Pair-programming is not a common practice at Guidewire right now.  I hope one day more people at Guidewire can agree with this post.

http://www.nomachetejuggling.com/2009/02/21/i-love-pair-programming/

Right now, we are trying it out at Phoenix project.