Test Automation Planet

September 01, 2010

James Bach

Fall Schedule

I’ll be traveling and training this fall. Take note of where I’ll be, because if I come near where you are and want some relatively free consulting, all you have to do is take me to dinner.

I’ll do almost anything for free food.

September

  • Reston, Virginia
  • San Diego, California

October

  • Stockholm, Sweden
  • Tartu, Estonia
  • Cluj-Napoca, Romania
  • Bucharest, Romania

(The event in Cluj-Napoca is a one-day public seminar that introduces my Rapid Testing methodology through a series of puzzle challenges and lecture. I will also take any and all questions about testing. Click here to sign up for that.)

November

  • Tallinn, Estonia

December

  • England
  • (tentatively) Singapore

(I have one class to teach in England. I’m interested in doing something more, if there is any interest. Email me.)

My full schedule is published here.

by James Bach at September 01, 2010 06:11 PM

Eric Jacobson

Test This #6 – Interface Outages

Your application under test (AUT) probably interfaces with external systems. Fact: these external systems will go down at some point while a user is attempting to use your AUT.

Here is the obvious test:

  1. Take ExternalSystemA down. If this is outside your control, simulate ExternalSystemA’s outage by changing where your AUT points for ExternalSystemA.
  2. Trigger whatever user operations cause your AUT to interface with ExternalSystemA.
Expected Results: The user gets a friendly message indicating some functionality is blocked at this time. The support team gets an error alert indicating ExternalSystemA is not responding.



We executed the above test for 6 or 7 external systems and got our AUT robust enough to only block minimum functionality and provide good communication to users and the support team. However, just when we were getting cocky, we encountered a slight variation on the first test that crippled our AUT. Here is the test we missed for each external system.

  1. Put ExternalSystemA into a state where it is up and running but cannot respond to your AUT within the amount of time your AUT is willing to wait. Note: We were able to simulate this by taking down ExternalSystemB, which gets called by ExternalSystemA.
  2. Trigger whatever user operations cause your AUT to interface with ExternalSystemA.
Expected Results: The user gets a friendly message indicating some functionality is blocked at this time. The support team gets an error alert indicating ExternalSystemB is not responding.

by noreply@blogger.com (Eric Jacobson) at September 01, 2010 04:53 PM

Brian Marick

“Editing” trees in Clojure with clojure.zip

Clojure.zip is a library that lets you move around freely within trees and also create changed copies of them. This is a tutorial I wish I’d had when I started using it.

In this tutorial, I’ll use sequences as trees. You can create your own kind of trees if you want, but I won’t cover that.

Here’s what we’ll be working with:

user=> (def original [1 '(a b c) 2])
#'user/original

It’s a tree whose root is a vector with three children: 1, the subtree (a,b,c) and 2. We need to convert it into some sort of data structure that allows free movement. That’s done like this:

user=> (require '[clojure.zip :as zip])
nil
user=> (def root-loc (zip/seq-zip (seq original)))
#'user/root-loc

Notice that I used the alias zip. If you refer or use clojure.zip, you’ll find yourself overwriting useful functions like clojure.core/next.

Notice also that I explicitly wrapped the tree in a seq. If you use seq-zip on an unwrapped vector, you’ll get confusing results.

At this point, I’d describe root-loc as “the location (or loc) of the root of the tomorrow tree.” I say “tomorrow tree” because it’s not a tree itself, but something that can later be converted into a tree. In reality, root-loc names both the loc and the tomorrow tree, bundled up together, but I think it most straightforward to leave the tomorrow tree implicit.

(The actual data structure is called a “zipper”, which is a decent analogy for the actual implementation but didn’t help me understand how to use the library.)

Moving around inside a tomorrow tree

With the loc, we can move around:

user=> original
[1 (a b c) 2]
user=> (zip/node (zip/down root-loc))
1

zip/down moves to the leftmost child of the current loc and returns that child’s loc. zip/node gives you the subtree of the original tree corresponding to its loc argument. It’s one of the main ways you get parts of a tree—regular lists, vectors, and the like—”out of” a tomorrow tree.

The -> macro makes movement in the tomorrow tree much easier to understand:

user=> (-> root-loc zip/down zip/right zip/node)
(a b c)
user=> (-> root-loc zip/down zip/right zip/down zip/right zip/node)
b

Nevertheless, I always wrap anything but the simplest traversals in their own functions.

Here are some other movement functions for you to try: up, left, rightmost, and leftmost. Beware of (arguable) inconsistencies in the handling of edge cases. For example, right and rightmost behave differently when moving “off the end” of a list of siblings:

user=> original
[1 (a b c) 2]
user=> (def last-one (-> root-loc zip/down zip/right zip/right))
#'user/last-one
user=> (zip/node last-one)
2
user=> (-> last-one zip/right)
nil         ; off into nothingness
user=> (-> last-one zip/rightmost zip/node)
2           ; stays put

Parts of a tree

In addition to zip/node, there are other functions for recovering parts of the original tree. All work relative to the current loc.

user=> (def b (-> root-loc zip/down zip/right zip/down zip/right))
#'user/b
user=> (zip/node b)
b
user=> (zip/lefts b)
(a)
user=> (zip/rights b)
(c)

An interesting one shows all subtrees from the root of the tree down to just above the current loc:

user=> (zip/path b)
[  (1 (a b c) 2)
      (a b c)     ]

Changing the tree

A number of functions take a current loc (and associated tomorrow tree) and produce a new loc inside a different tomorrow tree. For example, let’s delete the ‘(a b c) subtree:

user=> (def loc-in-new-tree (zip/remove (zip/up b)))
#'user/loc-in-new-tree

How does the tree represented by this new tomorrow tree differ from the original tree? We can see that with the root function, which applies zip/node to the root of the tomorrow tree:

user=> (zip/root loc-in-new-tree)
(1 2)
user=> original
[1 (a b c) 2]

Where, exactly, is the new location?

user=> (zip/node loc-in-new-tree)
1

The new loc has “backed up” from its previous version. (That’s not an exact enough description, but it’ll do for a few more paragraphs.) The other editing functions return an “unchanged” loc (except in the sense that it’s pointing into a new tomorrow tree with a changed structure): insert-left, insert-right, replace, edit, insert-child, and append-child. Try them out.

In which I reveal I’m slow

At first, I easily forgot that these functions create a new tomorrow tree and don’t really “replace” or “insert” or “edit” parts of the old one. That is:

user=> (zip/root loc-in-new-tree)
(1 2)                ; I see that I've edited the tree.
user=> (zip/root b)
(1 (a b c) 2)        ; Wait - I thought I changed the tree?!

“Well, duh”, you might say, “it’s a functional language with immutable state, so of course it doesn’t change the old tree.” You’re absolutely right, but it was surprisingly easy for old habits to sneak up on me. So, two rules:

  • If you ever fail to use the return value of one of these functions, you’re doing it wrong.

  • If you ever write code like this:

    (let [stashed-location (zip/whatever ...)]
       ... make "changes" ...
       ... use stashed location ...)
    

    you might be doing it wrong. Make sure that you’re not unthinkingly assuming that later changes to “the” tree are reflected in your stashed-location.

Whole-tree editing

Here’s an example of printing out a whole tree, one node at a time:

(defn print-tree [original]
  (loop [loc (zip/seq-zip (seq original))]
    (if (zip/end? loc)
      (zip/root loc)
      (recur (zip/next
                (do (println (zip/node loc))
                    loc))))))

This is an ordinary recursive loop. It visits each location in the tomorrow tree, stopping when zip/end? is true. zip/next returns a new current loc that is the next one in the tomorrow tree, where “next” means “in preorder depth-first order”. To see that, here’s what one run of the function prints:

user=> (print-tree [1 '(a (i ii iii) c) 2])
(1 (a (i ii iii) c) 2)
1
(a (i ii iii) c)
a
(i ii iii)
i
ii
iii
c
2

To make changes to the tree, add a cond. The default case should hand the current loc to zip/next. The other cases should yield a loc pointing into a changed copy of the tomorrow tree:

  (loop [loc (zip/seq-zip original-tree)]
    (if (zip/end?> loc)
      (zip/root loc)
      (recur (zip/next
          (cond (subtree-to-change? loc)
                (modify-subtree loc)
                …
                :else loc)))))

The tricky bit is making sure that modify-subtree returns a loc just before the next loc of interest (in a depth-first traversal). (It has to be before so that zip/next takes you to the interesting loc.) To get to that loc, you can use any of the movement functions (zip/next, zip/up, zip/rightmost, and so on). There’s also a zip/prev that returns the loc just before the current one.

To keep from confusing myself, I write little helper functions, each named by what it does and what loc it returns. So, for example, I have one function that glories in this name:

(defn wrap-with-expect__at-rightmost-wrapped-location [loc]
  (assert (start-of-arrow-sequence? loc))
  (let [right-hand (-> loc zip/right zip/right)
        edited-loc (zip/edit loc
                   (fn [loc] `(expect ~loc => ~(zip/node right-hand))))]
    (-> edited-loc zip/right zip/right zip/remove zip/remove)))

It takes a form like ... (f 1) => (+ 2 3) :next, with the current loc being (f 1), and turns it into this:

... (expect (f 1) => (+ 2 3)) :next

… with the current loc being at the 3 so that the zip/next returns a loc at :next. This positioning works because I use zip/remove, which returns a loc that’s “backed up” to the previous loc in a depth-first traversal. (That’s the fix to my earlier imprecision about what zip/remove returns. It’s not the previous loc at the same level, which—for the sake of a simpler explanation—I earlier allowed you to assume.)

By building and testing these little functions first, my main cond-loop is easier to get right. You can see some more examples in my test package, Midje. You can look at both the tests and the code.

by Brian Marick at September 01, 2010 10:46 AM

August 31, 2010

Michael Bolton

Hire Ben Simo!

I have four or five blog posts in the hopper, each almost ready to go. I’m working on a whole book and a chapter of another one, and I’m on a deadline that I’m about to blow. The kids are still out of school, and I really should be cooking dinner right now. And yet…

As I write, one of the best testers that I know is looking for work. His name is Ben Simo. He lives in Colorado Springs, Colorado (my understanding is that he’s willing to relocate). He’s well-versed in LoadRunner and Performance Center, like many other testers. Unlike (alas!) so many other testers with those bullet points on the résumé, he’s not inclined merely to go through the motions and use tools for checking. He is an astute, passionate critical thinker, entirely focused on investigating and defending the value of the products for which he’s responsible by identifying problems that threaten that value. And yet’s he’s not of the Quality Assurance school; he entirely understands that assuring quality is the responsibility of those who produce and manage the work—programmers, writers, designers, artists, and managers. His job, as he sees it, is to make quality assurance possible. He collaborates with the project community, investigates the product ,and provides the most important, most timely information he can to the people who are producing and managing the work. With that, they can make the decisions they must make, informed by the very best technical information that he can provide.

He’s past President of the Association for Software Testing; he was Conference Chair for the 2009 Conference for the Association for Software Testing; he maintains one blog called Questioning Software and another called Is There A Problem Here?. He was recently given a three-part interview by UTest; you can read that here, and here, and here.

And, as of this writing, he’s available. For someone looking for a tester, he’s like the dream date that you spy across the dance floor whom a friend tells you is single, smart, modest, and well-off. The only issue is that maybe he’s a little too modest. He won’t be on the dance floor for long, because some organization will come along and sweep him off his feet. And that organization will be exceptionally lucky.

And that organization could be yours. His email address is ben at qualityfrog (period) com.

by Michael Bolton at August 31, 2010 10:46 PM

August 27, 2010

Michael Bolton

Statistician or Journalist?

Eric Jacobson has a problem, which he thoughtfully relates on his thoughtful blog in a post called “How Can I Tell Users What Testers Did?”. In this post, I’ll try to answer his question, so you might want to read his original post for context.

I see something interesting here: Eric tells a clear story to relate to his readers some problem that he’s having with explaining his work to others who, by his account, don’t seem to understand it well. In that story, he mentions some numbers in passing. Yet the numbers that he presents are incidental to the story, not central to it. On the contrary, in fact: when he uses numbers, he’s using them as examples of how poorly numbers tell the kind of story he wants to tell. Yet he tells a fine story, don’t you think?

In the Rapid Software Testing course, we present this idea (Note to Eric: we’ve added this since you took the class): To test is to compose, edit, narrate, and justify two parallel stories. You must tell a story about the product: how it works, how it fails, and how it might not work in ways that matter to your client (and in the context of a retrospective, you might like to talk about how the product was failing and is now working). But in order to give that story its warrant, you must tell another story: you must tell a story about your testing. In a case like Eric’s, that story would take the form of a summary report focused on two things: what you want to convey to your clients, and what they want to know from you (and, ideally, those two things should be in sync with each other).

To do that, you might like to consider various structures to frame your story. Let’s start with the elements of what we (somewhat whimsically) call The Universal Test Procedure (you can find it in the course notes for the class). From a retrospective view, that would include

  • your model of the test space (that is, what was inside and outside the scope of your testing, and in particular the risks that you were trying to address)
  • the oracles that you used
  • the coverage that you obtained
  • the test techniques you applied
  • the ways in which you configured the product
  • the ways in which you operated the product
  • the ways in which you observed the product
  • the ways in which you evaluated the product; and
  • the heuristics by which you decided to stop testing
  • what you discovered and reported, and how you reported

You might also consider the structures of exploratory testing. Even if your testing isn’t highly exploratory, a lot of the structures have parallels in scripted testing.

Jon Bach says (and I agree) that testing is journalism, so look at the way journalists structure a story: they often start with the classic pyramid lead. They might also start with a compelling anecdote as recounted in What’s Your Story, by Craig Wortmann, or Made to Stick, by Chip and Dan Heath. If you’re in the room with your clients, you can use a whiteboard talk with diagrams, as in Dan Roam’s The Back of the Napkin. At the centre of your story, you could talk about risks that you addressed with your testing; problems that you found and that got addressed; problems that you found and that didn’t get addressed; things that slowed you down as you were testing; effort that you spent in each area; coverage that you obtained. You could provide testimonials from the programmers about the most important problems you found; the assistance that you provided to them to help prevent problems; your contributions to design meetings or bug triage sessions; obstacles that you surmounted; a set of charters that you performed, and the feature areas that they covered. Again, focus on what you want to convey to your clients, and what they want to know from you.

Incidentally, the more often and the more coherently you tell your story, the less explaining you’ll have to do about the general stuff. That means keeping as close to your clients as you can, so that they can observe the story unfolding as it happens. But when you ask “What metric or easily understood information can my test team provide users, to show our contribution to the software we release?”, ask yourself this: “Am I a statistician or a journalist?”


Other resources for telling testing stories:

Thread-Based Test Management: Introducing Thread-Based Test Management, by James Bach; and A New Thread, by Jon Bach (as of this writing, this is brand new stuff)

Telling Your Exploratory Story: A presentation at Agile 2010, by Jonathan Bach (I was unable to download anything other than a damaged version this, but maybe it’s working now; please let me know)

Constructing the Quality Story (from Better Software, November 2009): Knowledge doesn’t just exist; we build it. Sometimes we disagree on what we’ve got, and sometimes we disagree on how to get it. Hard as it may be to imagine, the experimental approach itself was once controversial. What can we learn from the disputes of the past? How do we manage skepticism and trust and tell the testing story?

On Metrics:

Three Kinds of Measurement (And Two Ways to Use Them) (from Better Software, July 2009): How do we know what’s going on? We measure. Are software development and testing sciences, subject to the same kind of quantitative measurement that we use in physics? If not, what kinds of measurements should we use? How could we think more usefully about measurement to get maximum value with a minimum of fuss? One thing is for sure: we waste time and effort when we try to obtain six-decimal-place answers to whole-number questions. Unquantifiable doesn’t mean unmeasurable. We measure constantly WITHOUT resorting to numbers. Goldilocks did it.

Issues About Metrics About Bugs (Better Software, May 2009): Managers often use metrics to help make decisions about the state of the product or the quality of the work done by the test group. Yet measurements derived from bug counts can be highly misleading because a “bug” isn’t a tangible, countable thing; it’s a label for some aspect of some relationship between some person and some product, and it’s influenced by when and how we count… and by who is doing the counting.

On Coverage:

Got You Covered (from Better Software, October 2008): Excellent testing starts by questioning the mission. So, the first step when we are seeking to evaluate or enhance the quality of our test coverage is to determine for whom we’re determining coverage, and why.

Cover or Discover (from Better Software, November 2008): Excellent testing isn’t just about covering the “map”—it’s also about exploring the territory, which is the process by which we discover things that the map doesn’t cover.

A Map By Any Other Name (from Better Software, December 2008): A mapping illustrates a relationship between two things. In testing, a map might look like a road map, but it might also look like a list, a chart, a table, or a pile of stories. We can use any of these to help us think about test coverage.

by Michael Bolton at August 27, 2010 03:11 PM

Google Testing Blog

An Ingredients List for Testing - Part Two

By James Whittaker

When are you finished testing? It’s the age old quality question and one that has never been adequately answered (other than the unhelpful answer of never). I argue it never will be answered until we have a definition of the size of the testing problem. How can you know you are finished if you don’t fully understand the task at hand?

Answers that deal with coverage of inputs or coverage of code are unhelpful. Testers can apply every input and cover every line of code in test cases and still the software can have very serious bugs. In fact, it’s actually likely to have serious bugs because inputs and code cannot be easily associated with what’s important in the software. What we need is a way to identify what parts of the product can be tested, a bill of materials if you will, and then map our actual testing back to each part so that we can measure progress against the overall testing goal.

This bill of materials represents everything that can be tested. We need it in a format that can be compared with actual testing so we know which parts have received enough testing and which parts are suspect.

We have a candidate format for this bill of materials we are experimenting with at Google and will be unveiling at GTAC this year.

by James Whittaker (noreply@blogger.com) at August 27, 2010 11:35 AM

August 26, 2010

James Bach

Introducing Thread-Based Test Management

Most of the testing world is managed around artifacts: test cases, test documents, bug reports. If you look at any “test management” tool, you’ll see that the artifact-based approach permeates it. “Test” for many people is a noun.

For me test is a verb. Testing is something that I do, not so much something that I create. Testing is the act of exploration of an unknown territory. It is casting questions, like Molotov cocktails, into the darkness, where they splatter and burst into bright revealing fire.

How to Manage Such a Process?

My brother Jon and I created a way to control highly exploratory testing 10 years ago, called session-based test management (SBTM). I recently returned from an intense testing project in Israel, where I used SBTM. But I also experimented with a new idea: thread-based test management (TTM).

Like many of my new ideas, it’s not really new. It’s the christening (with words) and sharpening (with analysis) of something many of us already do. The idea is this: organize management around threads of activity rather than test sessions or artifacts.

Thread-based testing is a generalized form of session-based testing, in that sessions are a form of thread, but a thread is not necessarily a session. In SBTM, you test in uninterrupted blocks of time that each have a charter. A charter is a mission for that session; a light sort of commitment, or contract. A thread, on the other hand, may be interrupted, it may go on and on indefinitely, and does not imply a commitment. Session-based testing can be seen as an extension of thread-based testing for especially high accountability and more orderly situations.

I define a thread as a set of one or more activities intended to solve a problem or achieve an objective. You could think of a thread as a very small project within a project.

Why Thread-Based Test Management?

Because it can work under even the most chaotic and difficult conditions. The only formalism required for TBTM is a list of threads. I use this form of test management when I am dropped into a project with as little a day or two to get it done.

What Does Thread-Based Test Management Looks Like?

It’s simple. Thread-based test management looks like a todo list, except that we organize the todo items into an outline that matches the structure of the testing process. Here’s a mocked-up example:

Test Facilities

  • Power meter calibration method
  • Backup test jig validation
  • Create standard test images

Test Strategy

  • Accuracy Testing
    • Sampling strategy
    • Preliminary-testing
    • Log file analysis program
  • Transaction Flow Testing
  • Essential Performance Testing
  • Safety Testing
    • warnings and errors FRS review
    • tool for forcing errors
  • Compliance Testing
  • Test Protocol V1.0 doc.

Test Management

  • Change protocol definition
  • Build protocol definition
  • Test cycle protocol definition
  • Bug reporting protocol definition
  • Bug triage
  • Fix verifications

This outline describes the high level threads that comprise the test project. I typically use a mind-mapping program like MindManager to organize and present them.

So, you should be thinking, “Is that it? Todo lists?” right about now. Well, no. That’s not it. But that’s one face of it.

What Else Does Thread-Based Test Management Look Like?

It looks like testers gathered around a todo list, talking about what they are going to work on that afternoon. Then they split up and go to work. Several times day they might come together like that. If the team is not co-located, then this meeting is done over instant messaging, email, or perhaps through a wiki.

Is That All it Looks Like?

Well, there is also the status report. Whether written or spoken, the thread-based test management version of a status report lists the threads, who is working on the threads, and the outlook for each thread. It typically also includes an issues list.

Other documentation may be produced, of course. TBTM doesn’t tell you what documents to create. It simply tells you that threads are the organizing principle we use for managing the project.

Where Do Threads Come From?

Threads are first spawned from our model of the testing problem. The Satisfice Heuristic Test Strategy Model is an example of such a model. By working through those lists, we get an idea of the kinds of testing we might want to do: those are the first of the threads. After that, threads might be created in many ways, including splitting off of existing threads as we gain a deeper understanding of what testing needs to be done. Of course, in an Agile environment, each user story kicks off a new testing thread.

Which Threads Do We Work On?

Think priority and progress. We might frequently drop threads, switch threads, and pick them up again. In general, we work on the highest priority threads, but we also work on lower priority threads many times, when we see the possibility for quick and inexpensive progress. If I’m trying to finish a sanity check on the new build, I might interrupt that to discuss the status of a particular known bug if the developer happens to wander by.

Major ongoing threads often become attached to specific people. For instance “client testing” or “performance testing” often become full-time jobs. Testing itself, after all, can be thought of as a thread so challenging to do well, and so different from programming, that most companies have seen fit to hire dedicated testers.

How Do Threads End?

A thread ends either in a cut or knot. Cutting a thread means to cancel that task. A knot, however, is a milestone; an achievement of some kind. This is exactly the meaning of the phrase “tying up the loose ends” and marks either the end of the thread (or group of threads) or a good place to drop it for a while.

How Do We Estimate Work?

In thread-based test management, there is no special provision or method for estimating work, except that this is done on a thread-by-thread basis. Session-based test management may be overlaid onto TBTM in order to estimate work in terms of sessions.

How Do We Evaluate Progress?

In thread-based test management, there is no special provision or method for evaluating progress, either, except that this is done on a thread-by-thread basis, and status reports may be provided frequently, perhaps at the end of each day. Session-based test management is also helpful for that.

So What?

This form of management is actually quite common. But, to my knowledge, no one has yet named and codified it. Without a convenient way to talk about it, we have a hard time explaining and justifying it. Then when the “process improvement” freaks come along, they act like there’s no management happening at all. This form of management has been “illegible” up to now (meaning that it’s there but no one notices it) and my brother and I are going to push to make it fully legible and respectable in the testing arena.

From now on, when asked about my approach to test management, I can say “I practice Rapid Testing methodology, which I track in either a thread-based or session-based manner, depending on the stage of the project, and in a risk-based manner at all times.”

How is TBTM Any Different From Using a TO-DO List?

Michel Kraaij questions the substance of TBTM by wondering how it’s different from the age-old idea of a “to-do” list? See his post here.

This is a good question. Yes, TBTM is different than just using a to-do list, but even so, I don’t think I’ve ever read an article about to-do list based test management (TDBTM?). Most textbooks focus on artifacts, not the activity of testing. Thread-based test management is trying to capture the essence of managing with to-do lists, plus some other things in addition to that.

The main additional element, beyond just making a to-do list, is that a traditional to-do list contains items that can be “done”, whereas many threads might not ever be “done.” They might be cut (abandoned) or knotted (temporarily parked at some level of completion). Some threads maybe tied up with a bow and “done” like a normal task, but not the main ones that I’m thinking of. As I practice testing, for instance, I’m rarely “done” with test strategy. I tinker with the test strategy all the way through the project. That’s why it makes sense to call it a thread.

Another thing to recognize is that the main concern of TBTM is how to know what to put on your thread list. The answer to that invokes the entire framework of Rapid Software testing. So, yeah, it’s more than having an outline of threads, which does look very much like a to-do list– it’s the activity (and skills) of making the list and managing it. If you want to talk about to-do list based test management, then you would have to invent that lore as well. You couldn’t just say “make a to-do list” and claim to have communicated the methodology.

[You can find Jonathan's take on TBTM here.]

[I credit Sagi Krupetski, the test lead on my recent project, for helping me get this idea. His clockwork status reporting and regular morning question "Where are we on the project and what do you think you need to work on today?" caused me to see the thread structure of our project clearly for the first time. He's back on the market now (Chicago area), in case you need a great tester or test manager.]

by James Bach at August 26, 2010 10:42 PM

Eric Jacobson

How Can I Tell Users What Testers Did?

Question: What metric or easily understood information can my test team provide users, to show our contribution to the software we release?

I just got back from vacation and am looking at a beautiful pie chart that shows the following per iteration:
  • # of features delivered
  • # of bugs found in test vs. prod
  • # of bugs fixed
  • # of test cases executed
After a series of buggy production releases, my team (or at least the BAs) have decided to provide users with colorful charts depicting how hard we’ve been working each iteration. My main gripe is providing my BAs with a # representing executed test cases.

First, I feel uncomfortable measuring tester value based on test case count, for obvious reasons.

Second, the pie chart looks like all we do is test. One slice lists 400 tests. Another lists 13 features...strange juxtaposition.

Third, I’m not even sure how to provide said count. I certainly don’t encourage my test team to exhaustively document their manual test cases, nor do I care how many artifacts they save distinct tests within. Do I include 900+ automated UI test executions? Do I include thousands more unit test executions? Does the final # speak to users about quality? Does it represent how effective testers are? Not to me. Maybe it does to users...

PR is important, especially when your reputation takes a dive. I, too, want to show the users how hard my QA team works. I want to show it in the easiest possible way. I could provide a long list of tests, but they don't want to read that. What am I missing? What metric or easily understood information can my test team provide users, to show our contribution to the software we release?

by noreply@blogger.com (Eric Jacobson) at August 26, 2010 10:38 PM

August 25, 2010

Michael Bolton

All Testing is (not) Confirmatory

In a recent blog post, Rahul Verma suggests that all testing is confirmatory.

First, I applaud his writing of an exploratory essay. I also welcome and appreciate critique of the testing vs. checking idea. I don’t agree with his conclusions, but maybe in the long run we can work something out.

In mythology, there was a fellow called Procrustes, an ironmonger. He had a iron bed which he claimed fit anyone perfectly. He accomplished a perfect fit by violently lengthening or shortening the guest. I think that, to some degree, Rahul is putting the idea of confirmation into Procrustes’ bed.

He cites the cites the Oxford Online Dictionary definition of confirm: (verb) establish the truth or correctness of (something previously believed or suspected to be the case). (Rahul doesn’t cite the usage notes, which show some different senses of the word.)

When I describe a certain approach to testing as “confirmatory” in my discussion of testing vs. checking, I’m not trying to introduce another term. Instead, I’m using an ordinary English adjective to identify an approach or a mindset to testing. My emphasis is twofold: 1) not on the role of confirmation in test results, but rather on the role of confirmation in test design; and 2) on a key word in the definition Rahul cites, “previously“.

A confirmatory mindset would steer the tester towards designing a test based on a particular and  specific hypothesis. A tester working in a confirmatory way would be oriented towards saying, “Someone or something has told me that the product should do be able to do X. My test will demonstrate that it can do X.” Upon the execution of the (passing) test, the tester would say “See? The product can do X.” Such tests are aimed in the direction of showing that the product can work.

Someone working from an exploratory or investigative mindset would have a different, broader, more open-ended mission. “Someone or something has told me that the product does X. What are the extents and limitations of what we think of as X? What are the implications of doing X? What essential component of X might we have missed in our thinking about previous tests? What else happens when I ask the product to do X? Can I make the product do P, by asking it to do X in a slightly different way? What haven’t I noticed? What could I learn from the test that I’ve just executed?” Upon performing the test, the tester would report on whatever interesting information she might have discovered, which might include a pass or fail component, but might not. Exploratory tests are aimed at learning something about the product, how it can work, how it might work, and how it might not work; or if you like, on “if it will work”, rather than “that it can work”. To those who would reasonably object: yes, yes, no test ever shows that a product will work in all circumstances. But the focus here is on learning something novel, often focusing on robustness and adaptability. In this mindset, we’re typically seeking to find out how the program deals with whatever we throw at it, rather than on demonstrating that it can hit a pitch in the centre of the strike zone.

I believe that, in his post, Rahul is focused on the evaluation of the test, rather than on test design. That’s different from what I’m on about. He puts confirmation squarely into result interpretation, defining the confirmation step as “a decision (on) whether the test passed or failed or needs further investigation, based on observations made on the system as a result of the interaction. The observations are compared against the assumption(s).” I don’t think of that as confirmation (“establishing the truth or correctness of something previously believed or suspected to be the case”). I think of that as application of an oracle; as a comparison of the observed behaviour with a principle or mechanism that would allow us to recognize a problem. In the absence of any countervailing reason for it to be otherwise, we expect a product to be consistent with its history; with an image that someone wants to project; with comparable products; with specific claims; with reasonable user expectations; with the explicit or implicit purpose of the product; with itself in any set of observable aspects; and with relevant standards, statutes, regulations, or laws. (These heuristics, with an example of how they can be applied in an exploratory way, are listed as the HICCUPP heuristics here. It’s now “HICCUPPS”; we recognized the “Standards and Statutes” oracle after the article was written.)

At best, your starting hypothesis determines whether applying an oracle suggests confirmation. If your hypothesis is that the product works—that is, that the product behaves in a manner consistent with the oracle heuristics—then your approach might be described as confirmatory. Yet the confirmatory mindset has been identified in both general psychological literature and testing literature as highly problematic. Klayman and Ha point out in their 1987 paper Confirmation, Disconfirmation, and Information in Hypothesis Testing that “In rule discovery, the positive test strategy leads to the predominant use of positive hypothesis tests, in other words, a tendency to test cases you think will have the target property.” For software testing, this tendency (a form of confirmation bias) is dangerous because of the influence it has on your selection of tests. If you want to find problems, it’s important to take a disconfirmatory strategy—one that includes tests of conditions outside the space of the hypothesis that program works. “For example, when dealing with a major communicable disease (or software bugs —MB), it is more serious to allow a true case to go undiagnosed and untreated than it is to mistakenly treat someone.” Here, Klayman and Ha point out, if we want to prevent disease, the emphasis should be on tests that are outside of those that would exemplify a desired attribute (like good health). In the medical case, they say that would involve “examining people who test negative for the disease, to find any missed cases, because they reveal potential false negatives.” In testing, the object would be to run tests that challenge the idea that the test should pass. This is consistent with Myers’ analysis in The Art of Software Testing (which, interestingly, as it was written in 1979, predates Klayman and Ha’s paper).

As I see it, if we’re testing the product (rather than, say, demonstrating it), we’re not looking for confirmation of the idea that it works; we’re seeking to disconfirm the idea that it works. Or, as James Bach might put it, we’re in the illusion demolition business.

One other point: Rahul suggests “Testing should be considered complete for a given interaction only when the result of confirmation in terms of pass or fail is available.” To me, that’s checking. A test should reveal information, but it does not have to pass or fail. For example, I might test a competitive product to discover the features that it offers; such tests don’t have a pass or fail component to them. A tester might be asked to compare a current product with a past version to look for differences between the two. A tester might be asked to use a product and describe her experience with it, such that there’s an evaluation with explicit, atomic pass or fail criteria. “Pass and fail” are highly limiting in terms of our view of the product: I’m sure that the arrival of yet another damned security message on Windows Vista was deemed as a pass in the suite of automated checks that got run on the system every night. But in terms of my happiness with the product, it’s a grinding and repeated failure. I think Rahul’s notion that a test must pass or fail is confused with the idea that a test should involve the application of a stopping heuristic.  For a check, “pass or fail” is essential, since a check relies on the non-sapient application of a decision rule.  For a test, pass-vs.-fail might an example of the “mission accomplished” stopping heuristic, but there are plenty of other conditions that we might use to trigger the end of a test.

Since Rahul appears to be a performance tester, perhaps he’ll relate to this example (the framing of which I owe to the work of Cem Kaner). Imagine a system that has an explicit requirement to handle 100,000 transactions per minute. We have two performance testing questions that we’d like to address. One is the load testing question: “Can this system in fact handle 100,000 transactions per minute?” To me, that kind of question often gets addressed with a confirmatory mindset. The tester forms a hypothesis that the system does handle 100,000 transactions per minute; he sets up some automation to pump 100,000 transactions per minute through the system; and if the system stays up and exhibits no other problems, he asserts that the test passes.

The other performance question is a stress testing question: “In what circumstances will the system be unable to handle a given load, and fail?” For that we design a different kind of experiment. We have a hypothesis that the system will fail eventually as we ramp up the number of transactions. But we don’t know how many transactions will trigger the failure, nor do we know the part of the system in which the failure will occur, nor do we know way in which the failure will manifest itself.  We want to know those things, so have a different information objective here than for the load test, and we have a mission that can’t be handled by a check.

In the latter test, there is a confirmatory dimension if you’re willing to look hard enough for it. We “confirm” our hypothesis that, given heavy enough stress, the system will exhibit some problem. When we apply an oracle that exposes a failure like a crash, maybe one could say that we “confirm” that the the crash is a problem, or that behaviour we consider to be bad is bad. Even in the former test, we could flip the hypothesis, and suggest that we’re seeking to confirm the hypothesis that the program doesn’t support a load of 100,000 transactions per minute . If Rahul wants to do that, he’s welcome to do so. To me, though, labelling all that stuff as “confirmatory” testing reminds me of Procrustes.

by Michael Bolton at August 25, 2010 09:43 PM

August 20, 2010

Google Testing Blog

An Ingredients List for Testing - Part One

By James Whittaker

Each year, about this time, we say goodbye to our summer interns and bid them success in the upcoming school year. Every year they come knowing very little about testing and leave, hopefully, knowing much more. This is not yet-another-plea to universities to teach more testing, instead it is a reflection on how we teach ourselves.

I like to experiment with metaphors that help people "get it." From attacks to tools to tours to the apocalypse, I've seen my fair share. This summer, I got a lot of aha moments from various interns and new hires likening testing to cooking. We're chefs with no recipes, just a list of ingredients. We may all end up making a different version of Testing Cake, but we better at least be using the same set of ingredients.

What are the ingredients? I'll list them here over the next couple of weeks. Please feel free to add your own and I'll hope you don't steal my thunder by getting them in faster than I. Right now I have a list of 7.

Ingredient 1: Product expertise

Developers grow trees, testers manage forests. The level of focus of an individual developer should be on the low level concerns of building reliable and secure components. Developers must maintain intellectual mastery from the UI to low level APIs and memory usage of the features they code. We don’t need them distracted and overwhelmed with system wide product expertise duties as well.

Testers manage system wide issues and rarely have deep component knowledge. As a manager of the forest, we can treat any individual tree abstractly. Testers should know the entire landscape understanding the technologies and components involved but not actually taking part in their construction. This breadth of knowledge and independence of insight is a crucial complement to the developer’s low level insights because testers must work across components and tie together the work of many developers when they assess overall system quality.

Another way to think about this is that developers are the domain experts who understand the problem the software is solving and how it is being solved. Testers are the product experts who focus on the breadth of technologies used across the entire product.

Testers should develop this product expertise to the extent that they cannot be stumped when asked questions like "how would I do this?" with their product. If I asked one of my Chrome testers any question about how to do anything with Chrome concerning installation, configuration, extensions, performance, rendering ... anything at all ... I expect an answer right away. An immediate, authoritative and correct answer. I would not expect the same of a developer. If I can stump a tester with such a question then I have cause for concern. If there is a feature none of us know about or don't know completely then we have a feature that might escape testing scrutiny. No, not on our watch!

Product expertise is one ingredient that must be liberally used when mixing Testing Cake.

by James Whittaker (noreply@blogger.com) at August 20, 2010 09:54 AM

Object Mentor

The 4-contact points of software development

The three laws of TDD are:
  • Write no production code without a failing test
  • Write just enough of a test to fail
  • Write just enough production code to get the test to pass

This list doesn’t include refactoring, which is typically an assumed activity. In fact, some people refer to these rules as “red, green, refactor”. An even older version of this, from the Smalltalk community, is Red, Green, Blue. (Why Blue for refactor? I think someone was thinking RBG for a color space, luckily they didn’t try to use CMYK or LAB!)

In this simple model, there two kinds of code: test & production. There are two kinds of activity: writing & refactoring. Interestingly, at one level it is all code. The thing that distinguishes both sets is intent.

The intent of a test is to demonstrate or maybe specify behavior. The intent of production code is to implement (hopefully) business-relevant functionality.

The intent of writing code is creation. The intent of refactoring code is to change (hopefully improve) its structure without changing its behavior (this is oversimplified but essentially correct).

If you mix those combinations you have the 4-limbs of development:
  • Writing a test
  • Writing production code
  • Refactoring a test
  • Refactoring production code

An important behavior to practice is doing only one of these at a time. That is, when you are writing tests, don’t also write production code. Sure, you might use tools to stub out missing methods and classes, but the heart of what you are doing is writing a test. Finish that train of thought before focusing on writing production code.

On the other hand, if you are refactoring production code, do just that. Don’t change tests at the same time, try to only do one refactoring at a time, etc.

WHY?

First an analogy that almost always misses since most developers don’t additionally rock climb.

When rock climbing, a good general bit of advice is to only move one contact point at a time. For this discussion, consider your two hands and two feet as your four contact points. Sure, you can use your face or knee, but neither are much fun. So just considering two hands and two feet, that suggests that if, for example, you move your right hand, then leave your left hand and both feet in place.

This gives you stability, a chance to easily recover by simply moving the most recent appendage back in place and, when the inevitable happens, another appendage slips, you have a better chance of not eating rock face. If you move more than one thing at a time, you are in more danger because you’ve taken a risky action and reduced the number of points of contact, or stability.

Will you sometimes move multiple appendages? Sure. But not as a habit. Sometimes you need to take risks. The rock face may not always offer up movement patterns that make applying this recommendation possible. Since you know the environment will occasionally work against you, you need to maintain some slack for the inevitable.

Practicing Test Driven Development is similar. If you change production code and tests at the same time, what happens if a test fails? What is wrong? The production code, the test, both, neither? An even more subtle problem is that tests pass but the test is fragile or heavily implementation-dependent. While not necessarily an immediate threat, it represents design debt that will eventually cause problems. (This also happens frequently when tests are written after the production code as it’s seductively easy to write tests that exercise code, expressing the production’s code implementation but fundamentally hiding the intent.)

Notice, if you had only refactored code, then you know the problem is in one place. When you change both, the problem space actually goes from 1 to 3 (4 if you allow for neither). Furthermore, if you are changing both production and test code at the same time and you get to a point where you’ve entered a bottomless pit, you’ll end up throwing away more work if you choose to restore from the repository.

Are there going to be times when you change both? Sure. Sometimes you may not see a clear path that gives you the option to do only one thing at a given time. Sometimes the tests and code will work against you. Often, you’ll be working in a legacy code base where there are no tests. Given that the environment will occasionally (or frequently) work against you, you need to maintain some slack.

Essentially, be focused on a single goal at any given time: write a test. then get it to pass. clean up production code & keep the tests first unchanging and then passing.

I find that this is a hard thing both to learn and to apply. I frequently jump ahead of myself. Unfortunately I’m “lucky” enough when I do jump ahead that when I fail, I thoroughly fall flat on my face.

This approach is contextual (aren’t they all?). Every time you start working on code, you’ll be faced with these four possibilities. Each time you are, you need to figure out what is the most important thing in the moment, and do that one thing. Once you’ve taken care of the most important thing, you may have just promoted the second most important thing to first place. Even so, reassess. What is the most important thing now? Do that.

Good luck trying to apply this idea to your development work. I’m interesting in hearing about it.

by Brett Schuchert at August 20, 2010 04:11 AM