Binary Disease Rikard Edgren

I have for a long time felt that something is wrong with the established software testing theories; on test design tutorials I only recognize a small part of the test design I live in.
So it felt like a revelation when I read Gerd Gigerenzer’s Adaptive Thinking where he describes his tools-to-theories heuristic, which says that the theories we build are based, and accepted, depending on the tools we are using.
The implication is that many fundamental assumptions aren’t a good match for the challenges of testing; they are just a model of the way our tools look.
This doesn’t automatically mean that the theories are bad and useless, but my intuition says there is a great risk.

Software testing is suffering a binary disease.
We make software for computers, and use computers for planning, execution and reporting.
Our theories reflect this much more, way much more than the fact that each piece of software is unique, made for humans, by humans.

Take a look at these examples:
* Pass/Fail hysteria; ranging from the need of expected results, to the necessity of oracles.
* Coverage obsession; percentage-wise covered/not-covered reports without elaborations on what is important.
* Metrics tumor; quantitative numbers in a world of qualitative feelings.
* Sick test design techniques, all made to fit computers; algorithms and classifications disregarding what is important, common, risky, error-prone.

When someone challenges authorities, you should ask: “say you’re right, what can you do with this knowledge?”

I have no final solutions, but we should take advantage of what humans are good at: understanding what is important; judgment; dealing with the unknown; separating right from wrong.

We can attack the same problems in alternative ways:
* Testers can communicate noteworthy interpretations instead of Pass/Fail.
* If we can do good testing and deem that no more testing has to be done, there is no need for coverage numbers.
* If the context around a metric is so important, we can remove the metric, and keep the vital information.
* We can do our test design without flourishes; focusing on having a good chance of finding what is important.

Do you think it is a coincidence that test cases contains a set of instructions?
These theories cripple testers; and treat us like cripples.

Now you know why people are saying testing hasn’t developed the last years: we are in a dead end.

And the worst is that if Gigerenzer is right; my statements have no chance of being accepted until we have a large battery of tools based on how people think and act…

17 Comments
Joe Strazzere May 18th, 2011

“Testers can communicate noteworthy interpretations instead of Pass/Fail.”

What kind of “noteworthy interpretations” would replace a declaration of Pass/Fail?

Rikard Edgren May 18th, 2011

Hi Joe

I don’t understand your question.
Can you give some more information, or perhaps an example?

Joe Strazzere May 18th, 2011

You wrote: “Testers can communicate noteworthy interpretations instead of Pass/Fail.”

I’m trying to understand what you meant by “noteworthy interpretations”, and why you prefer that over Pass/Fail.

Can you give me an example of a “noteworthy interpretation” that is better than a “Pass/Fail”?

Rikard Edgren May 18th, 2011

OK, “prefer” is answerable. “Replace” is not answerable, since I wouldn’t wanna replace something I see as just a bit better than useless.

With “noteworthy” I meam something that is interesting, it might be a bug or an enhancement, something that looks like a problem, or a risk, or something that works very well etc.
With “interpretation” I try to emphasize that every observation and communication, is an interpretation. Also for straightforward things like a crash, the interpretation is in how the crash is described, and also in all the information that is not communicated.

Example: I execute a test idea: “investigate first impression when launching start page of web site”, and report that
* the display sequence should have main content before sidebars
* headings are not visible in High Contrast Color Scheme
* scroll bars can not be used when using High DPI
* the main page looks cluttered

If I was suffering from Pass/Fail hysteria, I would not be able to execute vague test ideas like this. I would have separate tests for these things, e.g.
“verify start page display sequence”, “verify appearance with High Contrast”, “verify usage on High DPI”, “validate appearance of start page”.
This would be time-consuming, and ofgten have the very negative side effect that I wouldn’t look for other things than what the test states. (Many tests might not be included.)

Testing is a sampling activity, and I believe Pass/Fail thinking gives us a narrow way of looking at software. Human testers with a lot of knowledge, and the software at hand, are better judges of what might be important.

The Pass/Fail thinking gives no guidance to wheather the results are important or not, communicating “noteworthy interpretation” does (perhaps “important information” is better?)

Personally I can’t think of a situation where I’d prefer Pass/Fail, but I can imagine it might be useful in situations where you are heavily into checking, or if there are contractual requirements, and the user’s experience of the software aren’t important.

Do you have examples where Pass/Fail is good?

Joe Strazzere May 18th, 2011

(shrug)

Ok, I see what you mean.

I’m just picturing an argument among Product Management/Development/QA regarding what “looks cluttered” actually means, if it’s more important than the “crashes” interpretation, how to go about fixing it, and then how to verify the fix.

If this sort of noteworthy interpretation works for you, then great. Hopefully what is noteworthy to you and your testers, is noteworthy to the purchases/users as well.

Rikard Edgren May 18th, 2011

Understanding what is important is one of the reasons software (testing) is so difficult and fascinating.
But I think well-informed testers have a better chance of discovering details, than in-advance “Verify that…” tests.

Information that is difficult to interpret is best sorted out by talking.
And what’s the alternative, to not get the information at all?
And what does Fail mean?

David Greenlees May 19th, 2011

Good post Rikard.

In the comments above… “Do you have examples where Pass/Fail is good?”

Mission critical software? Like a missle launcher, auto-pilot, etc? Just thinking that maybe this all depends on what type of software you’re testing, or what the intended use of that software will be/is.

So in my view a ‘happy medium’ is required. A mixture of ‘pass/fail’ and ‘noteworthy interpretations’…

Thoughts?

Rikard Edgren May 19th, 2011

Thanks, David.

For an auto-pilot, I would not only wanna know if the accuracy meets the minimum specified in requirements.
I would like testing to show if it’s just at the minimum, or a lot better, if there are any deviations in different temperatures, air pressure etc.
I would be concerned about the usability, which is very difficult to put a valid Pass/Fail on, for different users,
Can you understand how the auto-pilot is doing?
Is it easy to know in which situations it should be used or not?

Reality isn’t binary.

Martin Jansson May 19th, 2011

Excellent Rikard!

I think there are many Checks on requirements that you would like a Pass/Fail criteria for such as this and that should go be above 2 seconds.

Regarding the “things look cluttered”, you can always discuss when it does NOT look cluttered to make the distinction. Still, do we get close to the pass/fail then?

Rikard Edgren May 19th, 2011

I know the feeling, Martin.
It is difficult to get totally free from the binary Pass/Fail addiction. You want just some, because it makes you feel good.

I doubt rational arguments will help, but I’ll try:
Wouldn’t it be a noteworthy difference between a Fail at 2.1 seconds and a Fail at 10 seconds?
Wouldn’t a Pass at 0.1 seconds indicate an implementation to be used as inspiration for other areas?
Wouldn’t a Pass at 1.9 seconds be better phrased “OK”?

I think you can live with the occasional Pass/Fail, and it won’t kill the product.
But a total freedom from the addiction is a very good feeling, and opens up to new solutions and development of the fundamental skill: communicating what’s important.

Henrik Emilsson May 23rd, 2011

NIce!

However, I’m not agreeing on that “necessity of oracles” is something binary. As you say, “humans are good at /…/ separating right from wrong” which means that we do have a mechanism for judging wether something is right or wrong; or at least if there might be a problem.

For all of your findings when investigating the start page of the web site (in your comment above) there is an oracle that tells you whenever there is a problem.
To say that something looks cluttered is to compare it with something that is not cluttered. This can be realized or discovered either by having a test that specifies to look for a cluttered start page or a test that says investigate the characteristics of the start page. Same with the other examples.

Rikard Edgren May 23rd, 2011

You’re right, Henrik. oracles in general are a good thing, something human.

I meant that they are binary when they are used in a Pass/Fail context, when saying that an Oracle is absolutely necessary in order to put Pass/Fail on a test.
When testing (i.e. verification) isn’t possible to do if we don’t have an Oracle.

I mean that testing always is possible, and that we might use an oracle to interpret the result, but we might just also observe, suspend judgment, and report what we found interesting.

Rikard Edgren June 8th, 2011

Joe, a better explanation of “noteworthy information” is available in James Bach blog post What Testers Find (+comments and spawned posts), http://www.satisfice.com/blog/archives/572
Testers find things like bugs, risks, issues, problems, artifacts, curios, tests, dependencies, questions, values, approaches, ideas, improvements, workarounds, frames, connections.

Rikard Edgren September 15th, 2011

I have elaborated the examples (pass/fail, coverage, metrics, techniques) in different formats:

Tea-time article Addicted to Pass/Fail? http://issuu.com/teatimewithtesters/docs/tea-time_with_testers_august_2011__year_1__issue_v

blog post Testing Clichés Part V: Testing needs a test coverage model http://thetesteye.com/blog/2011/01/testing-cliches-part-v-testing-needs-a-test-coverage-model/

blog post The Metrics Tumour http://thetesteye.com/blog/2011/08/the-metrics-tumour/

e-book The Little Black Book on Test Design http://www.thetesteye.com/papers/TheLittleBlackBookOnTestDesign.pdf

Anna Hayden January 21st, 2012

I think you have a very compelling argument for the push beyond a tester merely providing a yes or no answer in response to how is it going.

However, providing better quality information implies having an understanding (however big or small) of what quality looks and feels like. That implies that the skill sets and responsibilities and skills understood as a tester’s domain has to be more than just thinking about how to break something. It moves from breaking, discovering problems to discovering, probing, pondering about better solutions.

When you move from verification to refinement it’s no longer a classical tester that you require. That move pulls me towards perhaps suggesting a cross over/hybrid tester/BA or tester/UX or tester/IA (or any other combination). I suggest the hybrid simply because a comment is usually more useful when it comes from someone that knows what they are talking about. But even then, I am hesitant to say that because the two mindsets are almost opposing in their intent and application.

Note, I am taking about comments that a user would give of “this feels awkward to use” or “the screen looks ugly”. If you want useful, constructive commentary, the language needs to provide more definitive explanation of what could smooth out a particular rough edge.

So, I have two conclusions:

The first is that I believe that the simply binary answer of yes/no is fundamental to the practice of verification and indeed, thus, it’s a part of testing. I don’t believe it should be excluded or removed. I say that because at the core of verification is the constant questioning process, taken on by the tester, to remove ambiguity and ambivalence in what is presented to them as “this is the product”.

The second is that there is space for a wider net of probing and investigation but it does require a wider range of skills and experiences beyond what a normal tester would have to identify potential changes, describe them adequately and document them. I would rather lean towards having the role already involved in the creation of whatever quality aspect you are looking at to review, test and make the judgement.

Rikard Edgren January 21st, 2012

Thanks Anna for a thoughtful and good comment.
I agree that skill is essential for richer testing. A really good tester should learn fast what’s needed to know for the specific situation.

Yes, testers provide value early on, and teams can create more value with hybrid skills for everyone; e.g. more testing skills for developers would be great.
If all involved people know at least a little about the crucial areas, you have a diversity that can create good things, faster.

Verification is a part of testing, but we shouldn’t continue acting like it’s the only and most important thing.
We should get rid of tools, processes and mindset that make us only answer yes or no (even though yes/no can be a part of our information.

I guess your right that the classic tester is a low-skilled yay/nay-sayer.
We should change this.
Let testing be difficult.

[…] did a presentation on Curing Our Binary Disease (slides, abstract), which was much better received than I hoped for (I thought it was a binary love/hate talk) Good […]