Revising the Development Process: Getting More Agile in a Real-World Project

When I started working at Guidewire, back in 2002, the company was all of 15 people, maybe 10,000 lines of code, and one unreleased product.  No one really knew anything about Test Driven Development or unit testing in general and we didn’t really have a continuous integration server, but we did attempt to generally follow the scrum model and have daily sprint meetings, month-long sprints of development work that combined design and implementation (and testing, such as it was before we had any QA folks), and a backlog of work for the release organized in priority order that we’d pull from to plan the next sprint.  For a long time, that’s pretty much how we continued to develop:  at the start of each sprint people would estimate out what they thought was 20 days of work, and at the end of the sprint we’d see what had actually gotten done, discuss what went well and what we should change, and use that to inform the plan and process for the next sprint.

Eventually we set up an auto-build, started to try things like unit testing, and after several fumbling false starts there we managed to make it a core part of our process and culture.  Now, as we’ve mentioned before, we have our own in-house test harness application that manages running 40,000+ automated tests across dozens of branches over a farm of servers.

But somewhere along the line, the scrum process kind of broke down for us, in my opinion.  It happened at different points on different teams, and you can point to a lot of factors as the culprit:

  • Communication breakdowns as the team got larger
  • Increased inaccuracy of estimates as the product(s) got much, much larger and more complicated
  • Increased maintenance costs as we increased the number of customers and releases
  • Increased maintenance costs in the form of test maintenance
  • Poorer estimates, increased complexity, and increased product surface area lead to internal date slippages, putting pressure on everyone to scramble to still meet external date commitments, leading to process breakdowns and increased technical debt

There are probably other factors in there that I’m forgetting, but the upshot was probably a pretty classic software development story:  the methodology that worked well with 10 developers, tens of thousands of lines of code, one or two customers, and hardly any maintenance releases didn’t work so well with 50 developers, half a million lines of code, more customers, more releases to maintain, and 4-year-old crufty tests that often did more harm than good.

So what do you do about that?  Clearly, we needed to change our development methodology somehow, but for a long time we avoided really looking more seriously at anything like XP:  after all, we were already doing lots of testing, refactoring, month-long timeboxed iterations, iteration retrospectives, continuous integration, and maintaining a backlog of work that we pulled from each iteration.

Unfortunately, the agile community generally isn’t too helpful about dealing with real-world situations like ours:  large teams, large codebases, years of stacked development, legacy unit tests, multiple customers to please, hard release commitments to be met, no real on-site customers, etc.  Most of the literature just kind of tells you to change those things:  split the team up, simplify your codebase, make your tests faster and more independent, get customers on-site, avoid hard release commitments, etc.  Reading the agile literature can be frustrating at times as a result, and it can be easy to read through it and say “that won’t work for us” because, well, as strictly written it won’t.  (I’ll expand on those issues in some later posts).

The result, unfortunately, was that we didn’t end up tweaking our process all that much.  We did our best to deal with the test and code maintenance issues, and we attempted to split each product team into smaller “pods” within the team to address some of the management and coordination issues with larger teams.  As we did it at the time, however, I don’t think it was a particularly successful approach.

And then came the 3.0 release of PolicyCenter, where we rewrote a huge percentage of the application from the inside out (i.e. without changing the end-user behavior all that much) in an attempt to address some major architectural issues that lead to an explosion in complexity that had made the product buggy and difficult to work on.  That kind of cleanup, however, is inherently a huge unknown:  you’re not changing functionality, so you can’t really measure progress in terms of end-user changes, the changes were violent enough that going halfway on any of them wasn’t even close to an option, and the changes were also so drastic that most of our existing tests wouldn’t run or compile anymore, meaning that we had to start over from scratch on a lot of our testing efforts (realizing that we didn’t know what to test, incidentally, lead to the creation of the Riki).  We attempted to organize into sprints, but the reality was that we had no idea how long things would take, product managers weren’t able to provide much oversight or exert much control, and it was a whole lot of controlled chaos.  The fact that we made it out basically on time (as of our revised timeline) with a stable, functional product is a testament to the quality of the team, but there’s no way we could continue to work the way we have for the past year.

Meanwhile, one of Guidewire’s other products, BillingCenter, was being run much differently from the other application teams:  they were using a methodology much closer to stricter agile methodologies like XP, with two-week iterations, story cards, point-based estimation, and a focus on getting things “done done” before moving on to the next feature.  That was working much better for them than our scrum process ever had for the other teams (except, perhaps, back when we were tiny and had hardly any code), so naturally the rest of the teams have moved to adopt that model.  Our ClaimCenter team already has, and PolicyCenter will be in a few weeks when we start on our next release.

Of course, it’s never that easy, and we’ve got the disadvantages of a larger team than BillingCenter, a more complicated product, a more configurable product, much more disparity between our customers, a much larger long-term desired feature set, and a lot of resulting date pressure (both internal and external) around particular features.  Unsurprisingly, those are some of the problems that got the PolicyCenter project in trouble in the first place.

Even so, I’m confident a process change will keep us on the right track and will help to alleviate some of the issues that have killed us in the past.  So what, specifically, are we doing differently?

  • Cross-functional pods – Our original mistake with pods was to only really include development in them.  We’ve attempted to re-arrange our seating several times to include PM and QA in with the developers, which has helped, but we’re now going to more formally create sub-teams that officially include PM, QA, development, and (if we can) docs.  We’ll reduce cross-pod communication as much as we can, optimize for high-bandwidth communication within the pods, estimate and assign work at the pod level, and do our best to let the teams have latitude to self-organize and experiment with what works best for them.
  • Focus on “done done” – We fell into the classic development trap of leaving too much bug-fixing until the end, creating uncertainty, stress, and piling up deep-seated architectural issues until far too late in the cycle.  In my view, the lack of doneness is always largely driven by date pressure:  with date-based estimation and long-range release plans, developers always want to hit their estimates, and they’ll (often unconsciously) cut corners and skimp on testing to do it.  Making “doneness” an explicit, shared criteria ought to fix that, though it’ll slow down our perceived rate of progress (but in the long run increase our actual rate of progress).
  • More up-front agreement on features prior to development – PolicyCenter functionality is complicated, hard to get right, and contentious, and it requires much more up-front research, experimentation, and debate than normal features do.  In the past, we’ve started working on features before those issues were worked out, and the aforementioned date pressure would make people feel they had to build something even though there wasn’t necessarily agreement on what to build.  Doing that work in-process, as it were, was often pretty fatal:  the product managers would be rushed and the developers would be frustrated or would just make assumptions.  We’re focusing now on using stories and doing more up-front work to figure out what to build so that when it comes time to plan an iteration we only schedule work that’s already been fully agreed upon by all parties.
  • Shorter iterations and stricter timeboxing – Four weeks just turns out to be too long to really adjust, which means that our timeboxing was never that strict and priorities would be shifted mid-sprint when new issues came up because people just couldn’t wait.  Our lack of up-front agreement pretty much always guaranteed that unexpected issues would crop up as well and ensured that our estimates would be inaccurate and wishful.  Moving to two week iterations with more up-front agreement and more reliable estimates should make it possible to better avoid mid-iteration corrections.
  • Tracking velocity rather than estimates – Estimating work in days seems natural, but it’s just a horrible, horrible mistake.  Doing it meant that we never corrected when our estimates were skewed by maintenance burdens or just chronic optimism about how fast we could work, and combined with a lack of up-front agreement our estimates were usually fairly inaccurate.  The real upshot was that the team couldn’t commit to its estimates, further exacerbating the timeboxing problems, and we never really had a great indication of how fast we were actually going since a lack of “done doneness” threw things off as well.

That, of course, is merely my hope for how things will work out.  We’re starting development of the next release a couple of weeks from now, using that process, and I’m sure we’ll learn plenty and make plenty of tweaks as we go.  I’ll do my best to report back on how it actually works out, what difficulties we find, and what we try to do to overcome them.

3 Comments on “Revising the Development Process: Getting More Agile in a Real-World Project”

  1. Raoul Duke says:

    thanks for posting that. hearing about reality is extremely valuable, and seems to sometimes be along the old billy joel “honesty” lines unfortunately.

    what i don’t quite yet grok is when i read your “what we are doing differently” i hear things which are standard scrum/agile things to do. how do those really help you with e.g. the 50+ people issue which i thought you said broke standard approaches? apologies if you said it and i’m just not seeing the tree for the forest!

    as an addendum, it is reassuring to see your bulleted list underscore the agile claims of how to get things really done. best of luck in your down-to-earth dealings with reality. for sure at least you-all have the insight and intelligence to recognize reality and think about how to handle it, more than one can say for lots of software houses i suspect.

  2. Alan Keefer says:

    Indeed, they are pretty much standard agile things to do. Most of them are things we thought we were doing, but in reality weren’t doing a good job of. For example, we say we’re all going to write tests and fully fill out features before moving on to the next thing, and we sometimes have, but we’ve relied too much on individual discipline to achieve that and so that practice has broken down as we’ve slowed down due to larger teams, more complex products, and more past releases to support. We thought we were timeboxing appropriately, re-planning every month and trying to avoid mid-sprint corrections, but that practice broke down over time as well. We tried re-organizing our team into smaller sub-teams, and tried moving PM and QA in with the developers in the seating plan, but never really managed to find something that worked there either.

    The two things we really didn’t try are stories and velocity-tracking; we’ve always worked from a simple backlog, which worked alright when stories were more straightforward but which has been seriously problematic on PolicyCenter, where features are far more complicated, non-obvious, contentious, and take longer to build. Our day-based estimation has always been a mistake, in hindsight.

    I’m working on another post where I’ll detail the main challenges we have in making agile work for us; the big ones are team/product/codebase size, lack of real on-site customer involvement (since we sell software to multiple customers), long release cycles, and the fact that much of what we ship acts as a platform or toolkit, leaving us with a huge exposed public API that we have to worry about maintaining and upgrading. Each of those throws a different wrench (or seven) in the agile processes. For the most part, we really just have to find ways to deal with them while not abandoning the core agile values around high in-team communication, done done-ness, automated regression testing, up-front agreement about what “done” means for a feature, and working in small feature increments within short development iterations.

    For the large team issue, I might not have explained the approach that well. Basically, we’re attempting to split into 4 sub-teams that we call “pods”. Each pod will have about 4 developers (as we grow the team, they’ll probably get up to about 6 before we split) and will focus around the features owned by 1 or 2 product managers (who essentially act as our customers). They’ll also have around 2-3 QA people and 1 subject matter expert. The pods will self-organize and work will be assigned to the pod as a whole, they’ll manage their own estimation and backlog, all sit together as a group, etc. We’ll still have to have whole-team demos every two weeks and a scrum of scrums once a week, and we’ll need cross-pod communication a lot of the time, as well as team-level roles for dev lead, architect (to make sure the overall system is cohesive, not to draw boxes instead of coding), a lead PM who has final say, and a program manager to make sure everything actually happens. Those roles naturally get much harder to play as the team gets bigger, so we’ll see how that goes. The core idea is to organize each team as a self-contained sub-team as much as possible, and to optimize for high-bandwidth communication within the pod at the expense of making cross-pod communication more difficult.

    We’ve tried smaller groups in the past, but we didn’t go whole-hog with it; we assigned the groups to functional areas rather than product managers, we still assigned work at a per-person level for the most part, and we also didn’t associate any testers explicitly with the group (they were also associated per-feature as well). By bringing everyone in, really treating it as a small self-contained team, and focusing on whatever the PM is responsible for as the backlog for the pod, we hope that will mean that maybe 90% of the communication that needs to happen happens within the pod.

  3. Raoul Duke says:

    many thanks for the extra info, and for reporting on it to everybody else. the industry really needs to understand this issue better, via grounded experience.

    it is interesting to think about what the priorities are: when faced with some impedance mismatch between theory and reality, does one choose to force the agile square peg to become rounded-off, or does one choose to find a way to make the round hole of reality more angular?

    there’s something to be said for taking things to an extreme. like, haskell takes purity to an extreme. if you can pull it off, the benefits can be great. but there exists a lot of gray area where things Just Aren’t Working before you get to the Light On The Other Side, and obviously nobody wants to get stuck in the J.A.W. zone. the group has to have faith in whichever path they choose.

    from what i’ve read over the years, small, self-sufficient teams are something i find ever more crucial. i hear e.g. MSFT is sorta doing that with “feature crews”.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s