A tester’s view

As this is my first post on this blog, let me introduce myself and explain why I joined Guidewire. First, this blog had a lot to do with it – it gave me a pretty good idea what sort of folks I would be working with and made me excited enough to apply. Historically, most of this blog has focused on production code development. My intention is to shed some light on our testing adventures.

Most of my professional experience has been as a developer. Initially, I built business apps, but then more recently I gravitated towards building tools for both developers and testers. I was lucky to get into Agile development early and had a chance to work with many of the pioneers in the field. I especially fell in love with test-driven development and still maintain the http://testdriven.com site.

One of the tools I built was a framework for developing acceptance tests in java. I had the great fortune to present it at the Austin Workshop on Test Automation (http://www.pettichord.com/awta5.html ) which is where I was first introduced to Exploratory Testing ( http://en.wikipedia.org/wiki/Exploratory_testing ). Guidewire turns out to be a great place to practice exploratory testing because of the inherent complexity of the applications, the fact that Guidewire has been Agile from day one, and the copious amounts of unit and acceptance tests that the developers write. Guidewire products combine core business functionality and a full-fledged development environment so the testing needs are diverse and challenging on both the business and technical axes.

Now you may wonder: Why would a professional developer want to focus on exploratory testing? – isn’t it manual testing, which is considered the simplest and least well compensated of testing disciplines? Hopefully, if you read the wiki article above, this prejudice will begin to dissolve for you as it did for me. It’s true that I do a fair amount of manual testing, but it’s what we call ”brain-engaged testing”, and I only do manually what is either not worth automating or impossible to automate. If anyone has automated ”thinking”, please contact me privately so we can make billions off this discovery. 🙂

As a team, we do automate many of the data generation, data collection, and data processing tasks that support exploratory testing. Of course, we do all sorts of testing at Guidewire. We collect metrics and perform root cause analysis to tune our testing mix based on actual bug discovery data. One of the promising directions for us is high-volume, pseudo-random testing. We are reusing some of the tools and tests developed as part of UI-driven acceptance testing to accomplish this. I hope to share some of the results of this experiment in the near future. Bye for now.


The Birth of the Requirements Wiki

As a development organization we’re by no means perfect, though we’re constantly looking for ways to improve, and one of the ways in which we’ve historically had a lot of room for improvement is around internal documentation of requirements and feature specifications.  We’ve come up with what we hope will be a much better long-term solution, but before I describe what that solution is, I’d like to rewind and tell the story of how we ended up where we are right now.

Imagine that your company is writing a policy administration system (such a randomly chosen example, I know), and you’d like to know the answer to the question “How do policy renewals need to work?”  How would you go about trying to answer it?

Historically, we’ve done our requirements documentation and feature specification in a fairly old-school manner:  product management would write up a document describing the requirements for a new feature, development would up a design topic on the wiki for anything that requires some more thought and discussion, and development would write up more of a full specification afterwards describing how things actually work.  The end result is that the “requirements” for a given piece of functionality tend to be difficult to discern after the fact:  they’re scattered across a bunch of early-stage PM documents that are generally 1) deltas against each other, 2) don’t always resemble what was actually built, 3) tend to have a lot of ambiguities (since they’re written as normal prose), and 4) don’t capture any of the little things discussed and agreed upon by dev, PM, and QA over the course of actually building a feature.  The development specs (when they’re actually up to date) tend to describe how things actually are rather than what the underlying business requirements are, are also written as deltas, are written at a semi-arbitrary level of detail, and aren’t written for things like UI functionality, while the design topics serve more to explain why things were implemented as they were.  So if you want to know how policy renewals work your best bet is just to ask someone; the information is so scattered, out of date, and incomplete that it’s impossible to piece it back together.

We do a lot of test-driven development, so you might ask “But what about the tests?”  The agile philosophy is that the tests can often serve as the documentation, and that’s kind of true of well-written, complete unit tests (I still don’t think that’s 100% true, but that’s a different argument).  But the problem is that unit tests themselves are at too low a level to be useful for answering higher-level questions like “How does a policy renewal work?”  Even questions like “What happens when I click the ‘Add Vehicle’ button?” are difficult to document via tests because they require an entirely different level of tests than “unit” tests.  They require end-to-end tests, and those tests tend to be harder to write and harder to read; they’re also much more difficult to ensure completeness for, since you can’t measure test coverage using a tool or even match up the set of methods against your set of tests.  In addition, for infrastructure work the tests tend to help describe the implementation, not the high level requirements.

The other problem with using tests is, unfortunately, that they tend to get deleted when they break too badly; at some point it’s inevitable that some refactoring or other major change will break enough unit tests that you just don’t have the energy or inclination to fix them all right away, so you rewrite and fix what you can and just comment out or delete the ones you can’t.  That’s also true of tests that are written against the actual implementation rather than at some higher level of abstraction; if you change the implementation, all those tests are simply irrelevant, so you have no choice but to kill them.  That might not be ideal, but practically speaking that’s what actually happens in the real world where real people write real tests, and as such tests are a bit shaky to rely on as the sole source of documentation about business requirements.

“What about story cards?” you might ask.  Well, one unfortunate fact is that the policy team that I work on hasn’t used story cards in the past (we will be in the next release cycle).  One of our other teams does drive everything off of story cards, but even then I think there are some problems.  First of all, stories are inherently deltas, and over the course of a release or over many releases the same functionality is often continuously changed, making it difficult to piece together an answer to “How does policy renewal work?” because doing so requires assembling all the stories relating to renewals over the course of several releases in chronological order so that the appropriate deltas are applied in order.  Ouch.  Story cards are also inherently somewhat unorganized and can contain information relating to multiple different parts of the system, so just assembling that set of cards in the first place can be difficult.  Story cards would still be lightyears ahead of where we were a year ago, so perhaps if we’d had them we wouldn’t have built the tools that we did, but since we didn’t have those cards we had to find a different way to do things.

So that was our situation a year ago:  information about how things were supposed to work was largely in people’s heads, and we had scattered, generally untargeted end-to-end test coverage that touched many parts of the system.

That’s around the time we started to rework some major portions of our application, and before we started we thought the main risk we ran was that we’d break things without realizing it.  In order to mitigate that, we wanted to fill out all (or at least a good number) of the tests for a given area of the application before we changed things.  But how would we know we had “all” the tests and weren’t missing something?  Without any obvious “units” to test we’d have no chance, so we decided to make our own units.  They weren’t really stories in the traditional sense:  they were statements like “The ‘Add Vehicle’ button takes you to the ‘New Vehicle’ page” and “The ‘Clone Vehicle’ button clones all selected vehicles, cloning all of their fields except for the VIN number.”  Some of them could have been stories in the story card sense, but plenty of them were too fine-grained for story cards.  For lack of a better term, we decided to call them “requirements” instead.  Our process then became that we’d first attempt to reverse engineer the requirements for a page before we rewrote it, generally by reading any existing documentation and then by playing around with the page to see what it actually did.  After we wrote those down, we’d try to have them reviewed by the product managers for accuracy and completeness, and then we’d use the requirements to drive a set of tests around the page.  Ensuring a sufficient level of testing became much easier, because we could target the tests to the requirements just as you’d target unit tests to a method.  Once we were done we were pretty sure we’d catch most of the breaks we might introduce, and we’d go ahead with whatever refactoring/rearchitecting needed to happen.

The idea was the right one, I think, but we had questions over how exactly we’d manage the requirements docs.  What format would they be in?  How would we organize them so people could find them?  Hardest of all, how would we ensure they stayed up to date?  We really wanted to measure coverage of the requirements as well, so how would we do that?  To get the ball rolling we started out just using Google spreadsheets to track the requirements; the spreadsheet format ensured the requirements were relatively small and targeted (and hopefully unambiguous) line items instead of prose paragraphs describing things.  I even wrote a way, using annotations and the Google SOAP API, to create some simple HTML reports about what requirements had tests.  It was pretty clear that was a sub-optimal solution, but it was a start.

The question really became where to go with things:  if we wanted to try to cover our whole application this way and really drive a lot of our automated end-to-end testing off of it, we’d really need everyone on the team to be on board with it, and doing that would probably require some much better tool for managing things.  Thankfully we had an engineer who was fairly amazing at coming up with little tools to solve all sorts of development problems, and he agreed to take the lead on formalizing things and later driving adoption of the tool.  The end result was basically an addition to MediaWiki that we called the “requirements wiki” and was eventually nicknamed “The Riki”.  The modification added in some special tags for listing requirements, which would then (on a first update) assign them unique IDs.  It also allows you to tag requirements with labels like “agreed” and “implemented,” along with several other clever things.  The IDs can then be used as annotations for test methods to tie the test methods back to the requirements, and the Riki has a background process that periodically takes a build and processes all the annotations to link things up, resulting in the ability to display the current test methods inline with the requirements as well as coverage reports about what percentage of requirements have any tests at all as well as what percentage of the tests have actually been implemented.  The latter statistic allows us to add in empty test methods as a way of sketching out a test plan without implementing them all immediately.

The riki is still pretty young in its life, so the jury is still out on our ability to really keep it up to date.  So far, though, it’s proven useful as a way to coordinate dev, QA, and PM by giving everyone a shared, authoritative reference point about how things are supposed to work.  I’m hopeful that by making it an indispensible part of our development process we’ll manage to overcome the inherent problems with keeping documentation up to date and that it’ll drive clearer, less ambigious requirements, better testing, better communication between dev, QA, and PM, and serve as an ongoing reference for anyone new to the team or to a particular area.


Quality Assurance in Guidewire’s Agile World

The goal of quality software development is to release a product which will meet or exceed customer expectations. Sounds simple, but few software companies (especially those developing enterprise applications) ever achieve this goal and even fewer do so on a consistent basis. From it’s inception Guidewire adopted aspects of Agile Development and Extreme Programming with the objective of high quality and on-time, customer-relevant releases. My ambition with this blog entry is to share the experience of the QA team working in Guidewire’s Agile model with the hope of sharing lessons learned, how we work, and what we strive for.

There’s no QA in XP

QA and Agile/XP are not natural bedfellows. This issue has been discussed ad nauseum in various forums (see http://www.theserverside.com/news/thread.tss?thread_id=38785 for a relatively entertaining discourse on the matter). In synopsis, Kent Beck (the creator of Extreme Programming) did not see a role for a QA team or testers in general. In From Extreme Programming Explained Beck stated: “An XP tester is not a separate person, dedicated to breaking the system and humiliating the programmers.” Rather, in Agile/XP, programmers develop extensive unit tests and the end-customers decide whether the resulting product (or feature) is acceptable or not. In fact, the founders of Guidewire debated whether or not to have a QA team, based both on the ideals of XP in addition to negative experiences with the effectiveness of QA teams at prior companies. However, Guidewire has consistently maintained a 2-to-1 developer to QA Engineer ratio, a very high investment in QA as compared to the software industry as a whole. Why the disconnect? Is Guidewire really an Agile/XP company? Is Agile/XP too idealistic?

My 2 cents

My first assessment is that Guidewire indeed does try to emulate Agile/XP goals. My second assessment is that strict adherence to Agile/XP is simply unrealistic. Following are several reasons why Guidewire has required a dedicated QA effort:

  • Unit testing, test-first development, and pair-programming fit into that category of ideas that most everyone agrees are worth the investment. However, the intense dedication involved to successfully and consistently implement XP practices means the reality of XP projects often fall short of ambitions.
  • Unit tests generally fail to take into account both integration between features as well as the final packaged customer deliverable and the various environments the product will operate within. Without an effort to exercise the application in it’s end-customer state, including against all supported platforms, too many issues will be missed by unit testing alone.
  • From a psychological point-of-view it’s difficult to objectively critique your own creation (i.e. your own code). An interesting case study would be to contrast the tests a developer implements versus tests defined by a QA Engineer. It’s likely the developer unit tests would cover the obvious use cases that the code was designed for, but how likely is it that the tests would exercise the uglier aspects that lurk in the boundaries?
  • Automation (specifically unit tests) can not always be applied to all features or situations. Some amount of manual testing is realistically unavoidable.
  • QA Engineers live in a world of higher abstraction than a typical developer and are exposed to a broader view of the product and it’s public requirements. Considerations that are obvious at the customer-level are often not made aware to a developer focused on a specific block of code.
  • The customer feedback loop is not always ideal. Agile depends on customers willing to invest heavily in the product development effort, communicating to development their priorities and whether or not the product being built fits their needs. Many customers are unwilling or unequipped to make this investment. It’s arguable whether this is the best decision on the customer’s part, but the reality is that in Agile you’re asking customers to engage in what amounts to beta testing on steroids. As a result, some intermediary is necessary (namely QA) to determine if a feature meets it’s stated requirements.
  • Finally, in the world of mission critical insurance applications it is simply unacceptable to rely on the end customer to discover product issues. Of course some bugs will exist in a release which will be caught by customers, but keeping the number of issues to a minimum is key to successful deployments and content customers.

Hmmm. We do need QA. Now, what is QA again?

So, given that QA has shown to be justified at Guidewire, what has proven to be it’s most effective application? First off, I feel a little guilty using “we” when describing the QA team. A separate QA organization truly only belongs in a waterfall development model where testing is a stand-alone stage and code is delivered wholesale from development to QA. In addition, a team expressly associated with ensuring quality inherently defeats the purpose of the Agile/XP process where testing/quality should be addressed by everyone in the organization and at all points of development. My belief is that implementation of QA in an Agile/XP environment should follow general Agile/XP tenets (when in Rome…), namely establishing and maintaining core Agile/XP ideals, specifically a passion for continual improvement. Following are some guiding principles that I feel are inherent to any successful QA effort within an Agile/XP environment:

  • Never outsource QA. Luckily, this has never been an issue at Guidewire. The fact is that no salary differential (nor no communication device) will compensate for the collaboration that occurs when Engineers sit in the same room with no barriers to conversation.
  • There is no substitute for a good Engineer. Hiring is key.
  • Strive for tightly integrated development and QA teams. Ideally, QA Engineers sit next to their developer counterparts and the testing effort is shared and occurs in lockstep with code development. As well, the automated testing infrastructure should be common. At Guidewire we have what may be the ultimate automation solution. Tests developed by QA are run 24/7 by a harness which assigns broken tests to those who are responsible for their regression (usually development). Thus, a valid automated test is maintained ad infinitum. Simply checkin your test and walk away…
  • Make holistic quality-related decisions. Involving Development and Product Management with decisions impacting testing resources allows for more effective use of limited time. QA should focus on areas known to be of high risk, for example new features where unit testing is known to be lacking or the code base is likely to exhibit buggy behavior based on inherent complexity. As well, the likely customer impact of a bug in a given area (knowledge usually unique to PM) is valuable in terms of determining whether or not that feature deserves special attention.
  • Establish a model of continual training and leverage your knowledge base to keep Engineers up-to-date. At Guidewire we send all QA Engineers through training courses developed for Field Engineers. The expense is rather large (3 weeks of full-time training) but the payoff is Engineers exposed to the entire product and it’s customer-facing interface. Without this training it would likely take years for each Engineer to attain the same broad base of product knowledge.
  • Develop good tests, whether manual or automated. A bad test (defined as either being redundant, poorly defined, trivial, or at worst deceiving) is expensive in terms of maintenance and misrepresented confidence levels.
  • Automate, automate, automate. Guidewire strives for 100% automated test coverage. This is a brash goal and oftentimes the reality is far from ideal, but if you don’t shoot for the moon…
  • Treat test code as production code. Follow good coding conventions, comment well, and refactor each test such that it remains relevant. This is another example of a pinnacle of testing that is difficult to reach.

Finally

Hopefully this is a decent primer on the often misunderstood and historically maligned area of software development called Quality Assurance, especially as applied to Agile. There are many topics that apply to quality software development I’d like to eventually delve into or expand upon. Test coverage is a fascinating pseudo-science that’s fun to debate. The evolution of the QA Engineer from a key-banging monkey to a fully-fledged Object-Oriented programmer is interesting, as well (especially from a staffing point-of-view where said QA programmers are as rare as an early Triassic mammal). I would also like to further explore whether it may in fact be a healthy goal of a development organization to reach a state where QA is superfluous. In addition I’d like to cover what perhaps is the greatest challenge of Agile development – scaling what works well on a small team to a much larger organization. Finally, it’s fun for me to reminisce on the history of the Guidewire QA team, from pure manual testing to a nascent Test Harness and limited test automation, to the world today where GScript tests and the ToolsHarness infrastructure allow for near limitless automation potential.