Sampling & Serendipity Rikard Edgren
“Testing can’t be complete” might be the only statement all testers would agree upon.
This means that we only will run a few of all possible tests, and this is in many fields called sampling.
There isn’t too much said about qualitative sampling in software testing, so let’s look at what Grounded Theory says about theoretical sampling, and the need for adjustments based on reality:
“To say that one samples theoretically means that sampling, rather than being predetermined before beginning the research, evolves during the process. It is based on concepts that emerged from analysis and that appear to have relevance to the evolving theory.” Strauss/Corbin p.202
“The general rule when building theory is to gather data until each category is saturated.” Strauss/Corbin p.212
“The analyst might be, and often is, surprised at what he or she finds out when in the field. Variations that the analyst expected to find might not be there. New ones might emerge from quite unexpected sources. The analyst also must be prepared for this. But this serendipity is what makes doing qualitative research and analysis so much fun. There always is that element of discovery.” Strauss/Corbin p.233
It has been said that “a good tester is often lucky“, and half of it is not true; a good tester has the ability to see and smell important things, there is no luck involved. The other half is semi-true; a good tester make preparations and deviations to have bigger chances of luck, and this makes them stumble on important information more often (serindipity). A skilled tester thinks about many aspects at the same time, and sees things she wasn’t explicitly looking for.
But let’s follow the sampling thought a bit longer.
If we agree that sampling and serendipity is needed, does it make more sense that we should use multiple approaches, in order to come close to all the important information?
Does it make no sense to script all tests in advance and aim for 100% Pass?
If you agree that quality is multi-dimensional , do you want to spend at least some thought and time for all orthogonal aspects?
All due respect to equivalance partitioning, but would you feel comfortable just ’cause some part of the functionality has pretty good coverage?
Maybe we can get most areas and aspects partly covered by samples, and spread the samples in a good way to have a nice shot at the serendipity we need to find most of the important information.
Maybe the spread doesn’t need to be theoretically perfect, by relying on tester skill and serendipity, we might go for the fastest, and richest, tests, so we can execute many of them.
As in Grounded Thory (and exploratory testing), we can change the sampling strategy as we learn more about the product.
Exercise: For the next 10 bugs you find, note if the issue was something you expected to find, or if you were looking for something else.
Or do it as a team effort, but only count really important issues, and make it a game:
Prediction (manufacturing) vs. Serendipity (qualitative analysis)
First, I must point out that there are no statements on which all testers would agree, because, in point of fact, testers cannot even agree on who counts as a tester.
However, it is certainly a safe statement to make among people who call themselves testers. You are not likely to hear an uproar over “Testing can’t be complete.” Part of that is because everyone in the room will think they understand what that statement means, even though they might not all have the same understanding.
I love that you invoked Strauss. Wow! Educated testers! Not many people like you around…
A few days ago I was testing a device consisting of two boxes connected together. My initial goal was to run through basic functional tests in order to calibrate the log file. As I did so, I discovered a major blocking problem. This was strange because I had just seen the device working before I shut it down. So, I decided to vary the startup sequence and timing of startup between the two boxes. It turned out that the exact sequence of starting up actually affects the functionality of the system later on. Unexpected!
This is an example of how the inquisitive attitude of testing helps us recognize new parts of the testing space that we might have been aware of in an academic sense (I know there are an infinity of sequences and timings for starting up two boxes that are connected) but didn’t realize were important risks (I had no plan to test startup sequence, at first, because I assumed it made no difference).
What causes us to notice these new sample-worthy regions? Puzzling behavior.
Keep your eyes open for puzzling behavior. It’s a sign that you may need to example your sample space.
— James
I like this post!
When facing situations in software testing where there is a need for qualitative sampling, I also think that you need to be prepared in order to have “luck”.
I have often heard colleagues say “How do you do to find those tricky bugs?” and I have sometimes answered that I was lucky. But the truth is that I have been prepared such as my chance of finding something has been increased. This can be done in several ways; one way is to try to maximize the possibility of catching an error and then, based on the result, tune in on areas where there seems to be a problem. I.e., if I go in to a test area for the first time I try to use broad test data that is tricky; or an error prone machine (http://thetesteye.com/blog/2008/04/an-error-prone-windows-machine/ ); or changing many settings at a time before running a test; etc.
As James point out, there are things that causes us to notice sample-worthy regions. Puzzling behavior is one good thing. Messy UI is another. Complex and multifarious functionality is a third.
Thanks for good comments!
Actually, I’m still surprised Grounded Theory hasn’t been picked up more for usage inside Software Testing. (I have only seen Grounded Theory used when studying how software testing is performed (James Bach and in a Finnish thesis)
Professor Kaner has talked about software testing as a social science, http://www.kaner.com/pdfs/KanerSocialScienceSTEP.pdf, so the step isn’t big to look more closely at Grounded Theory, which in several ways is remarkable approprite for (the way I look at) software testing.
The day after I wrote this post I was testing a specific functionality, and noticed a font size problem (bug 1) in a similar function.
So I compared with an old version of the product, used accelerators to do it, and noticed that they had changed (As Designed),
So when I looked close at the accelerators, I noticed a duplicate for an add-in (bug 2), and decided to deploy another one of those, which wasn’t possible (bug 3)
This is the everyday serendipity I like to see in testing.