Ignoratio elenchi Henrik Emilsson

“Wouldn’t it be cool if we could come up with a Quality Value for our products?”  said a colleague of mine. “Yes! That would be super!” me and a couple of colleagues answered.

We had a lot of categories and data in our bug system; and perhaps most important was that we had a good data mining tool that enabled us to take the data and transform it by performing calculations and making aggregations of it.

We started out by analyzing the bug data and starting to come up with reasonable and weighted factors that would enable us to quantify the categories: Severity, Priority, Time to fix, Resolution, Bug Type,  etc. Then we constructed an algorithm that would go through all essential data and the result would be a numeric value, i.e. the Quality Value. We decided that the Quality Value should be in somewhere in the span 0-100 and scoring 100 would be the top Quality Value.
When we discovered anomalies in the result, we tuned the algorithm and the quantifiers so that the result would make more sense; and a lot of discussions were about the quantifiers and their impact on the result. After several iterations we started to get some reasonable numbers.

And by now you might have started wondering how we noticed the anomalies and why we could see that the numbers were reasonable? This happened because we already had a perceived value of the products so we were  biased by subjectivity (huh!).

Anyway, when we were satisfied with the numbers we had a marvelous Quality Value for each and all of our products! And I dare to say that we strongly believed in this Quality Value. Of course there was a constant debate on this subject, and we fought over how much certain categories should have impact on the overall value.
“My product” scored in the top interval so I was very pleased.
But pretty soon, somewhere inside of me, I heard a little voice whisper: “Ignoratio elenchi, ignoratio elenchi, …”

We were of course very naïve in that we believed that this metric would represent the quality of the product. Of course it didn’t!
A couple of observations:

  • Bug data only deals with reported bugs
  • Bugs are subjective
  • Bug reporting is subjective
  • Bug handling (management) is subjective
  • Bug fixing is subjective
  • All other quality criteria not caught in bug reports are not included in bug data
  • Quality is value to many people that haven’t reported anything
  • Bug data is only data about bugs (+ subjectivity)
  • All of the above means that we really cannot compare bugs with each other

On the other hand, one conclusion we came up with that might be true was that the ability to care about the product and bugs were reflected in the Quality Value. And in some way, this meant that a high score indicated that the product was taken care of (see Broken window theory and quality). While this might have been true, we were ever so wrong with the idea on capturing the Product Quality in a single Value…

Read about Ignoratio elenchi

Also see The Quality Status Reporting Fallacy

6 Comments
Torbjörn Ryber August 17th, 2010

If quality could be measured, and measured with a number, and this regarding to bugs – then it would be based on the bugs that are NOT found in testing before production but later appear in real use. And this only takes in account the bugs found and reported so far which means there could be really bad things we have not found so far.

It would be like a Zen riddle: How does the graph of bugs yet unknown look.

Henrik Emilsson August 17th, 2010

Right!

Then you also need an honest support department that report all bugs found by customers.

We discovered that the support department were measured by the number of solved/closed support cases – which ultimately ended up in they getting a bonus based on this. So all the easy annoying bugs with an easy work-around were never reported since they became a cash-cow. 🙂

Rikard Edgren August 18th, 2010

This is an interesting and difficult area.
I have one objection: the ‘subjective’ items in your list of concerns are only a problem if you believe that you are measuring THE ultimate objective quality value.
If you believe you are measuring one (of many) subjective-dependent quality value, there is one very valid concern from your list; that only reported bugs are included.

I was once involved in producing a similar metric, and it wasn’t too bad. We were aware of problems with it and did answer Kaner’s questions in http://www.kaner.com/pdfs/pnsqc00.pdf
The interesting end of the story was that the metric never was implemented at the company, since it was deemed that negative effects (people focusing on the metric, not the quality) outweighed the benefits of this collective-subjective number.
I think it was wise; product quality can’t be reduced to a number, metrics are dangerous.
I rather believe that the incentive for the metrics initiative was flawed: we saw it as a problem that quality status reports were based on gut feeling.

Titles are extremely important, if you call this “Product Quality” you give people other ideas than if you call it “Fixed vs. Found Bugs”

Henrik Emilsson August 18th, 2010

@Rikard: Well, let me elaborate on the ‘subjective’ items.
1. Since I was handling the bugs in one project, I could easily set a lot of bugs to “Must Fix” and force them to be fixed. This would generate a better “quality value” since more “important” bugs were fixed.
2. A typo could have more impact than a crash bug, which weren’t taken into consideration.
3. A bug fix could have been done in a way that it made it worse than before. It could also hide some not yet found bugs.
4. Even if you were aware of the problems with your metric, other people might not have been aware of it.
5. If one see bugs as reports that contains information regarding issues in a piece of software, then the quality of the report matters. And the content matters. It is like comparing all books in the library as if they were the same book, and then draw some conclusion from that.
6. Our metric was based on meta data about the bugs. The meta data were manually added by several different persons. The meta data was based on individual interpretations and driven by the team culture. Comparison of meta data between projects/products would therefore be difficult.
7. It is very easy to forget that the values are “measuring one (of many) subjective-dependent quality value”. The numbers are concretizing and simplifying the abstract, which is something people tend to like and stick to. And sometimes it is the only thing they can grasp.(*)

Regarding the incentive for the metrics. I might be wrong, but I think that for our metric there was an underlying motive to shed som light on some projects that did a sloppy work with the bug handling.

I agree, titles are very important. And they should never be try to be majestic in order to draw attention.

(*) You and I wrote an article and did a presentation on EuroSTAR in 2005 where we talked about advanced analysis of the bug system. One of our key points was that you need to know about the details in order to draw any conclusions from the data. By using a single quality value you obscure the details for all people not informed about the details. That is also dangerous.

Rikard Edgren August 18th, 2010

I’m totally with you.
And the details you specified is better (for me) than “X is subjective” (which I see mostly as a good thing.)
(*) I do believe in numbers as a complement to knowledge about details when analyzing bugs, customer incidents et.al.
I think it is common to search for the numbers that enforce the things you already knew, but you had to have the numbers in order to get people to trust you.

Henrik Emilsson August 19th, 2010

Oh, now I see what you meant by the comment regarding ‘subjective’ things.
For me those are also positive things!
What I tried to describe (in the original post, but perhaps better explained in the comment by examples) was that the bugs and its handling are subjective as in that neither the bugs nor their handling can be seen as predictable and predeterminable; and therefore not comparable (objectively seen).