Revising the Development Process: Getting More Agile in a Real-World ProjectPosted: October 31, 2008
When I started working at Guidewire, back in 2002, the company was all of 15 people, maybe 10,000 lines of code, and one unreleased product. No one really knew anything about Test Driven Development or unit testing in general and we didn’t really have a continuous integration server, but we did attempt to generally follow the scrum model and have daily sprint meetings, month-long sprints of development work that combined design and implementation (and testing, such as it was before we had any QA folks), and a backlog of work for the release organized in priority order that we’d pull from to plan the next sprint. For a long time, that’s pretty much how we continued to develop: at the start of each sprint people would estimate out what they thought was 20 days of work, and at the end of the sprint we’d see what had actually gotten done, discuss what went well and what we should change, and use that to inform the plan and process for the next sprint.
Eventually we set up an auto-build, started to try things like unit testing, and after several fumbling false starts there we managed to make it a core part of our process and culture. Now, as we’ve mentioned before, we have our own in-house test harness application that manages running 40,000+ automated tests across dozens of branches over a farm of servers.
But somewhere along the line, the scrum process kind of broke down for us, in my opinion. It happened at different points on different teams, and you can point to a lot of factors as the culprit:
- Communication breakdowns as the team got larger
- Increased inaccuracy of estimates as the product(s) got much, much larger and more complicated
- Increased maintenance costs as we increased the number of customers and releases
- Increased maintenance costs in the form of test maintenance
- Poorer estimates, increased complexity, and increased product surface area lead to internal date slippages, putting pressure on everyone to scramble to still meet external date commitments, leading to process breakdowns and increased technical debt
There are probably other factors in there that I’m forgetting, but the upshot was probably a pretty classic software development story: the methodology that worked well with 10 developers, tens of thousands of lines of code, one or two customers, and hardly any maintenance releases didn’t work so well with 50 developers, half a million lines of code, more customers, more releases to maintain, and 4-year-old crufty tests that often did more harm than good.
So what do you do about that? Clearly, we needed to change our development methodology somehow, but for a long time we avoided really looking more seriously at anything like XP: after all, we were already doing lots of testing, refactoring, month-long timeboxed iterations, iteration retrospectives, continuous integration, and maintaining a backlog of work that we pulled from each iteration.
Unfortunately, the agile community generally isn’t too helpful about dealing with real-world situations like ours: large teams, large codebases, years of stacked development, legacy unit tests, multiple customers to please, hard release commitments to be met, no real on-site customers, etc. Most of the literature just kind of tells you to change those things: split the team up, simplify your codebase, make your tests faster and more independent, get customers on-site, avoid hard release commitments, etc. Reading the agile literature can be frustrating at times as a result, and it can be easy to read through it and say “that won’t work for us” because, well, as strictly written it won’t. (I’ll expand on those issues in some later posts).
The result, unfortunately, was that we didn’t end up tweaking our process all that much. We did our best to deal with the test and code maintenance issues, and we attempted to split each product team into smaller “pods” within the team to address some of the management and coordination issues with larger teams. As we did it at the time, however, I don’t think it was a particularly successful approach.
And then came the 3.0 release of PolicyCenter, where we rewrote a huge percentage of the application from the inside out (i.e. without changing the end-user behavior all that much) in an attempt to address some major architectural issues that lead to an explosion in complexity that had made the product buggy and difficult to work on. That kind of cleanup, however, is inherently a huge unknown: you’re not changing functionality, so you can’t really measure progress in terms of end-user changes, the changes were violent enough that going halfway on any of them wasn’t even close to an option, and the changes were also so drastic that most of our existing tests wouldn’t run or compile anymore, meaning that we had to start over from scratch on a lot of our testing efforts (realizing that we didn’t know what to test, incidentally, lead to the creation of the Riki). We attempted to organize into sprints, but the reality was that we had no idea how long things would take, product managers weren’t able to provide much oversight or exert much control, and it was a whole lot of controlled chaos. The fact that we made it out basically on time (as of our revised timeline) with a stable, functional product is a testament to the quality of the team, but there’s no way we could continue to work the way we have for the past year.
Meanwhile, one of Guidewire’s other products, BillingCenter, was being run much differently from the other application teams: they were using a methodology much closer to stricter agile methodologies like XP, with two-week iterations, story cards, point-based estimation, and a focus on getting things “done done” before moving on to the next feature. That was working much better for them than our scrum process ever had for the other teams (except, perhaps, back when we were tiny and had hardly any code), so naturally the rest of the teams have moved to adopt that model. Our ClaimCenter team already has, and PolicyCenter will be in a few weeks when we start on our next release.
Of course, it’s never that easy, and we’ve got the disadvantages of a larger team than BillingCenter, a more complicated product, a more configurable product, much more disparity between our customers, a much larger long-term desired feature set, and a lot of resulting date pressure (both internal and external) around particular features. Unsurprisingly, those are some of the problems that got the PolicyCenter project in trouble in the first place.
Even so, I’m confident a process change will keep us on the right track and will help to alleviate some of the issues that have killed us in the past. So what, specifically, are we doing differently?
- Cross-functional pods – Our original mistake with pods was to only really include development in them. We’ve attempted to re-arrange our seating several times to include PM and QA in with the developers, which has helped, but we’re now going to more formally create sub-teams that officially include PM, QA, development, and (if we can) docs. We’ll reduce cross-pod communication as much as we can, optimize for high-bandwidth communication within the pods, estimate and assign work at the pod level, and do our best to let the teams have latitude to self-organize and experiment with what works best for them.
- Focus on “done done” – We fell into the classic development trap of leaving too much bug-fixing until the end, creating uncertainty, stress, and piling up deep-seated architectural issues until far too late in the cycle. In my view, the lack of doneness is always largely driven by date pressure: with date-based estimation and long-range release plans, developers always want to hit their estimates, and they’ll (often unconsciously) cut corners and skimp on testing to do it. Making “doneness” an explicit, shared criteria ought to fix that, though it’ll slow down our perceived rate of progress (but in the long run increase our actual rate of progress).
- More up-front agreement on features prior to development – PolicyCenter functionality is complicated, hard to get right, and contentious, and it requires much more up-front research, experimentation, and debate than normal features do. In the past, we’ve started working on features before those issues were worked out, and the aforementioned date pressure would make people feel they had to build something even though there wasn’t necessarily agreement on what to build. Doing that work in-process, as it were, was often pretty fatal: the product managers would be rushed and the developers would be frustrated or would just make assumptions. We’re focusing now on using stories and doing more up-front work to figure out what to build so that when it comes time to plan an iteration we only schedule work that’s already been fully agreed upon by all parties.
- Shorter iterations and stricter timeboxing – Four weeks just turns out to be too long to really adjust, which means that our timeboxing was never that strict and priorities would be shifted mid-sprint when new issues came up because people just couldn’t wait. Our lack of up-front agreement pretty much always guaranteed that unexpected issues would crop up as well and ensured that our estimates would be inaccurate and wishful. Moving to two week iterations with more up-front agreement and more reliable estimates should make it possible to better avoid mid-iteration corrections.
- Tracking velocity rather than estimates – Estimating work in days seems natural, but it’s just a horrible, horrible mistake. Doing it meant that we never corrected when our estimates were skewed by maintenance burdens or just chronic optimism about how fast we could work, and combined with a lack of up-front agreement our estimates were usually fairly inaccurate. The real upshot was that the team couldn’t commit to its estimates, further exacerbating the timeboxing problems, and we never really had a great indication of how fast we were actually going since a lack of “done doneness” threw things off as well.
That, of course, is merely my hope for how things will work out. We’re starting development of the next release a couple of weeks from now, using that process, and I’m sure we’ll learn plenty and make plenty of tweaks as we go. I’ll do my best to report back on how it actually works out, what difficulties we find, and what we try to do to overcome them.