Real-World Agile Difficulties

As I mentioned in my previous post, the PolicyCenter team is moving to a stricter agile process in the next release, with a new focus on short-term iterations, up-front feature agreement, and done doneness.  In order to do that, though, we’re going to run into a lot of problems that traditionally derail agile practices; they’re the sort of things the agile literature will tell you not to do because they make things difficult, but that we don’t really have any choice about.

They’re also the sorts of things that, in my experience, tend to make people pretty skeptical about agile’s claims.  Reading through the agile literature, you can pretty quickly create a mental picture of the idealized agile project:  a small, greenfield, internal development project.  Furthermore, there’s often an emphasis in agile on the fact that all the various practices are reinforcing, so you can’t really drop any of them.  That often adds up to some serious skepticism from people who aren’t working under those sorts of idealized conditions.

From what I’ve seen and read, there also isn’t a ton of guidance out there for how to apply agile methods in less-than-ideal cases, other than suggestions to try to get closer to the ideal case.  Perhaps it’s just that every situation is unique, so maybe there aren’t any generally-applicable rules.  To that end, I figured I’d document the sorts of problems we’re running into, and what we’re trying to do to deal with them.

Releasing Software to Multiple Customers

We ship to multiple customers, not just to one, which means that we have to use our product managers as customer proxies rather than having actual on-site customers.  What we’ve found over the last few years is that on PolicyCenter the features are so complicated and so contentious that we need to put much more work into up-front agreement about what the features are, rather than relying on that to happen within the iteration.  Otherwise, the product manager might need several days to check with customers or confer with other PMs to come up with an answer, or they might feel rushed to make a decision that later needs to be revisited.  That means working harder to get prototypes, mockups, and PRDs in front of customers earlier and to spend more time vetting possible implementation options with the development team prior to even attempting to schedule stories.

Long Release Cycles

An ideal agile project releases frequently; maybe even every week, but no longer than every few months.  Our release cycles vary between about 9 and 15 months.  There’s not much we can do about that, however; despite our best efforts to make the product upgradable, upgrade still isn’t trivial for our customers, and once they’re close to or in production they have no desire to upgrade frequently.  That means that frequent releases would just mean more versions we have to support; releasing every 3 months would mean a 4x increase in the number of versions we need to support, which would be absolute suicide.  Because of that, we’re stuck releasing relatively infrequently.  The clear downsides to that are the need to make long-term release plans, additional date pressure on those long-term plans (because the next release is a year out), and longer feedback cycles.  The best we can do is to try to get better real customer feedback as we go (which is difficult given how much they build on top of our application, which means we don’t get the feedback until they do the work), and to deal with the other issues as well as we can.

Long-Term Release Commitments

The long-term release commitments that are forced on us by long release cycles are, unfortunately, pretty unavoidable.  Customers that are buying the product need to make sure the investment makes sense for them, and that is often highly dependent on what they’re getting and when.  There might be certain key features they need in order for the release to be useful, and they’ll need to be able to budget, staff, and plan their projects, which means they’ll need to know when they can start and when they’ll get a finished product.  So we just don’t have the luxury of saying, “we’ll work on things in your priority order, so you’ll get that eventually” or “if it’s not in this release, just wait a month or two;” the next release is a year out, and we’re prioritizing across multiple customers anyway instead of just one.

The best we can do there is to combine good old SWAG estimates with as much risk management as we can stomach to try to come up with a plan for what we can commit to, giving ourselves a huge buffer in order to deal with the inevitable fact that some estimates will be off by 2-3x and that some sorts of issues will always come up mid-cycle.  I’m hopeful that a consistent team velocity combined with a release worth of data about how our initial feature-level SWAGs correspond to actual story points will let us do a better job of planning the next release after this one; I’ll let everyone know how that works out a year from now.

Large, Complicated Codebase

PolicyCenter is a really large, really complicated product.  So large and so complicated, in fact, that no one person can really understand in detail how everything is supposed to work or how it’s implemented.  Agile development relies heavily on shared code ownership in order to do things like incremental design and project-wide refactorings, but those get increasingly difficult as the codebase gets larger.  It also leads to slower build and test run times, and it necessitates that we have a dedicated QA group that can focus on the high-level feature interactions that developers will end up missing because they just can’t fit the whole picture in their head.  Going forward we’re going to try pair-programming more in order to do a better job of spreading knowledge around the team, but the reality is that we’re going to have to have informal code ownership by small subteams.

The other danger is that the product’s design becomes fragmented and the various parts cease to fit together because no one has the big-picture view.  Avoiding that is explicitly my job, and I haven’t entirely figured out yet how I’ll manage to keep doing it as the product gets larger; if I come up with anything novel or interesting I’ll certainly write about it.  To me architecture really needs to come out of one or two people’s heads, and those people need to understand the whole system, and that just doesn’t really scale up that well.

Large Team

Managing a large, complicated product requires a lot of engineers, so our team is currently 15 engineers (and growing), and once you add in PM and QA and Docs the overall team size is in the mid-30’s.  That’s not exactly a small team, and most agile practices geared towards small teams don’t work at that size.  To combat that, we’ve split the team up into four different cross-functional “pods,” as I described previously, in an effort to try to make agile practices work for a subset of the team.  There will still need to be cross-pod communication, and people will probably move between them every so often, but in general we’re optimizing for communication within the pod and using that as the primary level of organization, work assignment and velocity tracking.  We’ve tried that sort of organization before with mixed results, but we’ve never really made them cross-functional and given them this much independence; hopefully that will be the missing ingredient that makes them work well and helps the team scale.

Building A Toolkit

The last major issue is that a large portion of what we produce is a toolkit and API for clients to use in customizing the application.  Having a large published interface limits the kinds of refactoring we can do, and forces us to do much more thinking to get the API right up front so that we don’t have to change it.  To a lesser degree, that’s true of our database schema as well; it’s got to upgrade from release to release in a reasonable amount of time, so if we screw it up too badly we might never really be able to fix it.

That also means we can’t really commit to full incremental design or architecture; we’ve got to have some idea of where we’re going and know if it’s something we can live with long term, because we have a lot less flexibility to fix things we don’t like in future releases.  That’s naturally a difficult thing to do, and it really just requires a lot of skill, good taste, and luck.  It also means we’ll definitely make mistakes by attempting to anticipate the wrong things, but we don’t really have the option to not think about the long-term implications of what we’re doing.


4 Comments on “Real-World Agile Difficulties”

  1. Noel Grandin says:

    In my opinion, good architects understand their own limitations, and strive to contain the boundaries of the work they have to manage.

    For example, I am quite ruthless in maintaining my ignorance about various projects which I deem to be outside the scope of my architectural overview.

    It can be pretty hard when the project you’re working on grows beyond your own ability to manage.
    The only thing I can suggest is making sure that you have a really good integration architecture e.g. Eclipse’s OSGI framework, and then dividing the project up somehow, handing off a significant chunk to someone else.

  2. Raoul Duke says:

    @releases

    can you have internal releases vs. external ones? i really think it is important to have tight release schedules because that drives you to do many other important things, all of which keep the code from sucking more than it should 🙂

  3. Alan Keefer says:

    @ Noel:

    Thanks for the advice . . . the challenge to me is really in figuring out what projects can be ignored. That generally involves knowing which ones are self-contained and which ones will impact the rest of the project due to increased complexity, coupling, performance issues, etc. It’s not always obvious from the higher level which features are going to fall into which category, so that means I have to have some level of involvement periodically in a lot of different things, if only to make a firmer judgment call about what I have to worry about and what I can ignore.

    That only scales so far, so I’ve been arguing internally that if we ever decide to grow engineering team past ~20 developers that will require a much harder split into completely independent teams that work on separate code-bases that are split in such a way that they interface between them is as minimal as possible, and then we’ll have to have separate people assume the architect role for each team, as well as someone basically overseeing that split between the teams to make sure the interface there is as correct and narrow as possible. That sort of split is painful, though, so we’re going to avoid it as long as we can.

    @ Raoul:
    We’ve always had internal release dates several months in advance of the external release date to serve as an additional buffer and further mitigate the risk. We’ll continue to do that, but making our iterations stricter will hopefully give us hard internal release dates every 2 weeks, which has the benefits you mention. It forces people to focus on what really needs to get done.

    Doing that will be difficult, though: it requires that the velocity and estimates be consistent and stable enough for the pod to really believe in the commitment they make at the start of the iteration, and it’ll also require that we add slack to the iteration schedule solely so that we’re able to maintain that consistency. Without those things, it becomes expected or okay for the team to not finish all the stories they committed to for the iteration, and the iteration boundary loses its ability to keep people on task, since they simply expect that they won’t hit the mini-release target. It also devalues the timeboxing aspect, since if you’re not going to make the target anyway what’s the harm in adding or changing the schedule? I think it’ll be difficult to get to that level of consistency in our velocity and estimation, but I’m certainly going to do everything I can to get my team there and to impress upon everyone how important it is.

  4. Raoul Duke says:

    ja, it is one of those things that takes time to get working smoothly. and before it is working and the benefits are felt, it is hard to convince others to be willing to try. i went through all that, and saw some collateral damage due to a lack of good communication; the new model was very threatening to some folks. after a while i think the benefits were pretty clear, and ever since then when i come across a problem it is amazing how often time boxing would have greatly reduced the trouble!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s