As I mentioned in my previous post, the PolicyCenter team is moving to a stricter agile process in the next release, with a new focus on short-term iterations, up-front feature agreement, and done doneness. In order to do that, though, we’re going to run into a lot of problems that traditionally derail agile practices; they’re the sort of things the agile literature will tell you not to do because they make things difficult, but that we don’t really have any choice about.
They’re also the sorts of things that, in my experience, tend to make people pretty skeptical about agile’s claims. Reading through the agile literature, you can pretty quickly create a mental picture of the idealized agile project: a small, greenfield, internal development project. Furthermore, there’s often an emphasis in agile on the fact that all the various practices are reinforcing, so you can’t really drop any of them. That often adds up to some serious skepticism from people who aren’t working under those sorts of idealized conditions.
From what I’ve seen and read, there also isn’t a ton of guidance out there for how to apply agile methods in less-than-ideal cases, other than suggestions to try to get closer to the ideal case. Perhaps it’s just that every situation is unique, so maybe there aren’t any generally-applicable rules. To that end, I figured I’d document the sorts of problems we’re running into, and what we’re trying to do to deal with them.
Releasing Software to Multiple Customers
We ship to multiple customers, not just to one, which means that we have to use our product managers as customer proxies rather than having actual on-site customers. What we’ve found over the last few years is that on PolicyCenter the features are so complicated and so contentious that we need to put much more work into up-front agreement about what the features are, rather than relying on that to happen within the iteration. Otherwise, the product manager might need several days to check with customers or confer with other PMs to come up with an answer, or they might feel rushed to make a decision that later needs to be revisited. That means working harder to get prototypes, mockups, and PRDs in front of customers earlier and to spend more time vetting possible implementation options with the development team prior to even attempting to schedule stories.
Long Release Cycles
An ideal agile project releases frequently; maybe even every week, but no longer than every few months. Our release cycles vary between about 9 and 15 months. There’s not much we can do about that, however; despite our best efforts to make the product upgradable, upgrade still isn’t trivial for our customers, and once they’re close to or in production they have no desire to upgrade frequently. That means that frequent releases would just mean more versions we have to support; releasing every 3 months would mean a 4x increase in the number of versions we need to support, which would be absolute suicide. Because of that, we’re stuck releasing relatively infrequently. The clear downsides to that are the need to make long-term release plans, additional date pressure on those long-term plans (because the next release is a year out), and longer feedback cycles. The best we can do is to try to get better real customer feedback as we go (which is difficult given how much they build on top of our application, which means we don’t get the feedback until they do the work), and to deal with the other issues as well as we can.
Long-Term Release Commitments
The long-term release commitments that are forced on us by long release cycles are, unfortunately, pretty unavoidable. Customers that are buying the product need to make sure the investment makes sense for them, and that is often highly dependent on what they’re getting and when. There might be certain key features they need in order for the release to be useful, and they’ll need to be able to budget, staff, and plan their projects, which means they’ll need to know when they can start and when they’ll get a finished product. So we just don’t have the luxury of saying, “we’ll work on things in your priority order, so you’ll get that eventually” or “if it’s not in this release, just wait a month or two;” the next release is a year out, and we’re prioritizing across multiple customers anyway instead of just one.
The best we can do there is to combine good old SWAG estimates with as much risk management as we can stomach to try to come up with a plan for what we can commit to, giving ourselves a huge buffer in order to deal with the inevitable fact that some estimates will be off by 2-3x and that some sorts of issues will always come up mid-cycle. I’m hopeful that a consistent team velocity combined with a release worth of data about how our initial feature-level SWAGs correspond to actual story points will let us do a better job of planning the next release after this one; I’ll let everyone know how that works out a year from now.
Large, Complicated Codebase
PolicyCenter is a really large, really complicated product. So large and so complicated, in fact, that no one person can really understand in detail how everything is supposed to work or how it’s implemented. Agile development relies heavily on shared code ownership in order to do things like incremental design and project-wide refactorings, but those get increasingly difficult as the codebase gets larger. It also leads to slower build and test run times, and it necessitates that we have a dedicated QA group that can focus on the high-level feature interactions that developers will end up missing because they just can’t fit the whole picture in their head. Going forward we’re going to try pair-programming more in order to do a better job of spreading knowledge around the team, but the reality is that we’re going to have to have informal code ownership by small subteams.
The other danger is that the product’s design becomes fragmented and the various parts cease to fit together because no one has the big-picture view. Avoiding that is explicitly my job, and I haven’t entirely figured out yet how I’ll manage to keep doing it as the product gets larger; if I come up with anything novel or interesting I’ll certainly write about it. To me architecture really needs to come out of one or two people’s heads, and those people need to understand the whole system, and that just doesn’t really scale up that well.
Managing a large, complicated product requires a lot of engineers, so our team is currently 15 engineers (and growing), and once you add in PM and QA and Docs the overall team size is in the mid-30’s. That’s not exactly a small team, and most agile practices geared towards small teams don’t work at that size. To combat that, we’ve split the team up into four different cross-functional “pods,” as I described previously, in an effort to try to make agile practices work for a subset of the team. There will still need to be cross-pod communication, and people will probably move between them every so often, but in general we’re optimizing for communication within the pod and using that as the primary level of organization, work assignment and velocity tracking. We’ve tried that sort of organization before with mixed results, but we’ve never really made them cross-functional and given them this much independence; hopefully that will be the missing ingredient that makes them work well and helps the team scale.
Building A Toolkit
The last major issue is that a large portion of what we produce is a toolkit and API for clients to use in customizing the application. Having a large published interface limits the kinds of refactoring we can do, and forces us to do much more thinking to get the API right up front so that we don’t have to change it. To a lesser degree, that’s true of our database schema as well; it’s got to upgrade from release to release in a reasonable amount of time, so if we screw it up too badly we might never really be able to fix it.
That also means we can’t really commit to full incremental design or architecture; we’ve got to have some idea of where we’re going and know if it’s something we can live with long term, because we have a lot less flexibility to fix things we don’t like in future releases. That’s naturally a difficult thing to do, and it really just requires a lot of skill, good taste, and luck. It also means we’ll definitely make mistakes by attempting to anticipate the wrong things, but we don’t really have the option to not think about the long-term implications of what we’re doing.