Every developer knows that there are plenty of different ways to kill a software project: you can try to go too fast, you can change the requirements too often, you can build something no one wants, you can have the wrong team. But even the best-run, best-defined project can still run aground on my personal primary fear as a developer: complexity.
There are three especially scary things about complexity. First of all, it creeps up on you slowly. It’s entirely possible to get out a 1.0 or a 2.0 version of a product that’s overly complex, but at some point things just . . . stop. New features become harder and harder to add and bugs become harder to fix without introducing regressions. Pretty soon your team’s velocity has slowed to a crawl and your project just never seems to get to the quality level you need in order to release. I like to liken that endless bug-fix phase to playing whack-a-mole: every time you fix one bug, another bug pops up due to an unanticipated interaction or an incomplete understanding of the system. Both those conditions creep up slowly, and by the time they’re happening it’s often too late to really fix the underlying problems.
The second problem is that complexity-related problems are very, very hard to fix. They’re often caused by very deeply-rooted architectural issues that have been built upon for months or years before the problems became apparent, and fixing them can become incredibly high-risk. To matters even more difficult, it’s often hard to pinpoint a single cause of complexity, at least if it’s systemic. When the parts just don’t quite fit together right, it’s hard to tell which part is at fault or which part should be changed. And of course, when drastic changes are required, it’s always hard to be sure what the outcome will be, so fixing underlying architectural flaws is always something of a gamble.
The last problem is that complexity is, in the end, a force you can only hope to contain rather than one you can completely eliminate. Even the best developers with the most foresight, the best processes, and the clearest product definitions have upper limits on the amount of complexity they can handle, and trying to build something more complex than that upper limit is bound to fail. In my mind that complexity upper bound is up there with the halting problem or NP completeness as immovable forces you need to stay away from: if you’re trying to solve the halting problem, you’re barking up the wrong tree, and if you’re trying to make a product that’s too complex, you’d better find some way to simplify it.
How concerned you have to be about complexity will naturally depend on the scope of your application and how likely you are to come close to that upper bound: is it a large, complex application, a tiny script, or something in between? Is it a one-off, one-time project with a fairly contained scope, or is it something you’ll need to build on and extend for years? Of course, most large projects start out as small projects, so it’s dangerous to just assume that your small project will stay that way. That doesn’t mean you need to plan for complexity up front, but it does mean that if your formerly-small project starts growing up that you should worry about it before it’s too late or too expensive to fix.
Given all that, the question naturally becomes what to do about it. Dealing with complexity requires understanding the two different kinds of complexity: inherent and accidental. Inherent complexity is complexity that arises from the type of problem you’re trying to solve. Natural language processing, for example, has a lot of inherent complexity: there’s just only so much you can do to simplify the problem. Accidental complexity is complexity that arises from your choice of implementation: if you’re using a three-tier architecture and EJBs to power your blog, your blog software probably has a lot of accidental complexity.
The two kinds of complexity naturally have different strategies to mitigate them. Fixing accidental complexity is all about simplifying implementations using good software engineering practices: using higher-level languages and tools, properly decomposing code into the right kinds of re-usable building blocks, avoiding over-engineered solutions, etc. Fixing inherent complexity is all about constraining the problem space: cutting out overly-complicated features that don’t add enough value, not trying to do too much at once, and otherwise keeping requirements constrained. Somewhere in the middle is the idea of properly layering complexity, whereby you try to chop the application up into discreet components with a minimal, well-defined interface between them such that you can avoid worrying about anything besides the current layer you’re working on. Depending on how big the layers are, that can either serve to reduce the complexity of the implementation by reducing the maximum amount of complexity you have to deal with at one time, or it can also reduce inherent complexity by chopping the larger problem into smaller, more discreet sub-applications that can be specced out and implemented largely independently.
The real key to dealing with things is to always have a healthy respect for complexity and to be constantly looking for ways to reduce it. Good developers can handle dealing with more complexity than merely average ones, which means that a more talented team will be able to persevere longer in the face of rising complexity, but that also means that people sometimes take pride in being able to make really complicated things work, even if it’s ultimately a bad idea. The best developers, in my experience, tend to have a fear of complexity that leads them to back off and avoid those situations whenever possible, both by simplifying implementations and by working to make sure that the project scope stays controlled.
As a development organization we’re by no means perfect, though we’re constantly looking for ways to improve, and one of the ways in which we’ve historically had a lot of room for improvement is around internal documentation of requirements and feature specifications. We’ve come up with what we hope will be a much better long-term solution, but before I describe what that solution is, I’d like to rewind and tell the story of how we ended up where we are right now.
Imagine that your company is writing a policy administration system (such a randomly chosen example, I know), and you’d like to know the answer to the question “How do policy renewals need to work?” How would you go about trying to answer it?
Historically, we’ve done our requirements documentation and feature specification in a fairly old-school manner: product management would write up a document describing the requirements for a new feature, development would up a design topic on the wiki for anything that requires some more thought and discussion, and development would write up more of a full specification afterwards describing how things actually work. The end result is that the “requirements” for a given piece of functionality tend to be difficult to discern after the fact: they’re scattered across a bunch of early-stage PM documents that are generally 1) deltas against each other, 2) don’t always resemble what was actually built, 3) tend to have a lot of ambiguities (since they’re written as normal prose), and 4) don’t capture any of the little things discussed and agreed upon by dev, PM, and QA over the course of actually building a feature. The development specs (when they’re actually up to date) tend to describe how things actually are rather than what the underlying business requirements are, are also written as deltas, are written at a semi-arbitrary level of detail, and aren’t written for things like UI functionality, while the design topics serve more to explain why things were implemented as they were. So if you want to know how policy renewals work your best bet is just to ask someone; the information is so scattered, out of date, and incomplete that it’s impossible to piece it back together.
We do a lot of test-driven development, so you might ask “But what about the tests?” The agile philosophy is that the tests can often serve as the documentation, and that’s kind of true of well-written, complete unit tests (I still don’t think that’s 100% true, but that’s a different argument). But the problem is that unit tests themselves are at too low a level to be useful for answering higher-level questions like “How does a policy renewal work?” Even questions like “What happens when I click the ‘Add Vehicle’ button?” are difficult to document via tests because they require an entirely different level of tests than “unit” tests. They require end-to-end tests, and those tests tend to be harder to write and harder to read; they’re also much more difficult to ensure completeness for, since you can’t measure test coverage using a tool or even match up the set of methods against your set of tests. In addition, for infrastructure work the tests tend to help describe the implementation, not the high level requirements.
The other problem with using tests is, unfortunately, that they tend to get deleted when they break too badly; at some point it’s inevitable that some refactoring or other major change will break enough unit tests that you just don’t have the energy or inclination to fix them all right away, so you rewrite and fix what you can and just comment out or delete the ones you can’t. That’s also true of tests that are written against the actual implementation rather than at some higher level of abstraction; if you change the implementation, all those tests are simply irrelevant, so you have no choice but to kill them. That might not be ideal, but practically speaking that’s what actually happens in the real world where real people write real tests, and as such tests are a bit shaky to rely on as the sole source of documentation about business requirements.
“What about story cards?” you might ask. Well, one unfortunate fact is that the policy team that I work on hasn’t used story cards in the past (we will be in the next release cycle). One of our other teams does drive everything off of story cards, but even then I think there are some problems. First of all, stories are inherently deltas, and over the course of a release or over many releases the same functionality is often continuously changed, making it difficult to piece together an answer to “How does policy renewal work?” because doing so requires assembling all the stories relating to renewals over the course of several releases in chronological order so that the appropriate deltas are applied in order. Ouch. Story cards are also inherently somewhat unorganized and can contain information relating to multiple different parts of the system, so just assembling that set of cards in the first place can be difficult. Story cards would still be lightyears ahead of where we were a year ago, so perhaps if we’d had them we wouldn’t have built the tools that we did, but since we didn’t have those cards we had to find a different way to do things.
So that was our situation a year ago: information about how things were supposed to work was largely in people’s heads, and we had scattered, generally untargeted end-to-end test coverage that touched many parts of the system.
That’s around the time we started to rework some major portions of our application, and before we started we thought the main risk we ran was that we’d break things without realizing it. In order to mitigate that, we wanted to fill out all (or at least a good number) of the tests for a given area of the application before we changed things. But how would we know we had “all” the tests and weren’t missing something? Without any obvious “units” to test we’d have no chance, so we decided to make our own units. They weren’t really stories in the traditional sense: they were statements like “The ‘Add Vehicle’ button takes you to the ‘New Vehicle’ page” and “The ‘Clone Vehicle’ button clones all selected vehicles, cloning all of their fields except for the VIN number.” Some of them could have been stories in the story card sense, but plenty of them were too fine-grained for story cards. For lack of a better term, we decided to call them “requirements” instead. Our process then became that we’d first attempt to reverse engineer the requirements for a page before we rewrote it, generally by reading any existing documentation and then by playing around with the page to see what it actually did. After we wrote those down, we’d try to have them reviewed by the product managers for accuracy and completeness, and then we’d use the requirements to drive a set of tests around the page. Ensuring a sufficient level of testing became much easier, because we could target the tests to the requirements just as you’d target unit tests to a method. Once we were done we were pretty sure we’d catch most of the breaks we might introduce, and we’d go ahead with whatever refactoring/rearchitecting needed to happen.
The idea was the right one, I think, but we had questions over how exactly we’d manage the requirements docs. What format would they be in? How would we organize them so people could find them? Hardest of all, how would we ensure they stayed up to date? We really wanted to measure coverage of the requirements as well, so how would we do that? To get the ball rolling we started out just using Google spreadsheets to track the requirements; the spreadsheet format ensured the requirements were relatively small and targeted (and hopefully unambiguous) line items instead of prose paragraphs describing things. I even wrote a way, using annotations and the Google SOAP API, to create some simple HTML reports about what requirements had tests. It was pretty clear that was a sub-optimal solution, but it was a start.
The question really became where to go with things: if we wanted to try to cover our whole application this way and really drive a lot of our automated end-to-end testing off of it, we’d really need everyone on the team to be on board with it, and doing that would probably require some much better tool for managing things. Thankfully we had an engineer who was fairly amazing at coming up with little tools to solve all sorts of development problems, and he agreed to take the lead on formalizing things and later driving adoption of the tool. The end result was basically an addition to MediaWiki that we called the “requirements wiki” and was eventually nicknamed “The Riki”. The modification added in some special tags for listing requirements, which would then (on a first update) assign them unique IDs. It also allows you to tag requirements with labels like “agreed” and “implemented,” along with several other clever things. The IDs can then be used as annotations for test methods to tie the test methods back to the requirements, and the Riki has a background process that periodically takes a build and processes all the annotations to link things up, resulting in the ability to display the current test methods inline with the requirements as well as coverage reports about what percentage of requirements have any tests at all as well as what percentage of the tests have actually been implemented. The latter statistic allows us to add in empty test methods as a way of sketching out a test plan without implementing them all immediately.
The riki is still pretty young in its life, so the jury is still out on our ability to really keep it up to date. So far, though, it’s proven useful as a way to coordinate dev, QA, and PM by giving everyone a shared, authoritative reference point about how things are supposed to work. I’m hopeful that by making it an indispensible part of our development process we’ll manage to overcome the inherent problems with keeping documentation up to date and that it’ll drive clearer, less ambigious requirements, better testing, better communication between dev, QA, and PM, and serve as an ongoing reference for anyone new to the team or to a particular area.