Implication of emphasis on automation in CI Martin Jansson

Introduction
I would believe, without any evidence, that a majority of the test community and product development companies have matured in their view on testing. At conferences you less frequently see the argumentation that testing is not needed. From my own experience and perceiving the local market, there is often new assignments for testers. Many companies try to hire testers or get in new consulting testers. At least looking back a few years and up until now.

At many companies there is an ever increasing focus and interest in Continuous Deployment. Sadly, I see troublesome strategies for testing in many organisations. Some companies intend to focus fully on automation, even letting go of their so called manual testers. Other companies focus on automation by not accepting testers to actually test and explore. This troubles me. Haven’t testers been involved in the test strategy? Here are few of my pointers, arguments and reasoning.

Automation Snake oil
In 1999 James Bach wrote the article Automation Snake Oil [see reference 1], where he brings up a thoughtful list of arguments and traps to be avoided. Close to 17 years later, we see the same problems. In many cases they have increased because of the Continuous Deployment ideas, but also because of those from Agile development. That is, if you ignore all the new ideas gained in the test domain as well as all research done.
The miracle status of automation is not a new phenomenon, together with the lure of saving time and cost it is seducing. In some cases it will probably be true, but it is not a replacement of thinking people. Instead it could be an enabler for speed and quality.

Testing vs. Checking
In 2009, Michael Bolton wrote an article that clarified a distinction between Testing and Checking. Since then the definition has evolved. The latest article Testing vs. Checking Refined [see reference 2] is the last in the series. Most of the testers I know and that I debate with are aware of this concept and agree with the difference or acknowledge the concept.

If you produce test strategies in a CI-environment that put an emphasis on automation, and if it means mostly doing checking and almost no testing (as in exploration), then you won’t find the unexpected. Good testing include both.

Furthermore when developing a new feature, are you focusing on automating checks fulfilling the acceptance criteria or do you try to find things that have not been considered by the team? If you define the acceptance criteria, then only check if that is fulfilled. It will only enable you to reach a small part of the way toward good quality. You might be really happy how fast it goes to develop and check (not test) the functionality. You might even be happy that you can repeat the same tests over and over. But I guess you failed to run that one little test that would have identified the most valuable thing.

Many years ago a tester came to me with a problem. He said, “We have 16000 automated tests, still our customers have problems and we do not find their problems”. I told him that he might need to change strategy and focus more on exploration. Several years later another tester came to me with the same problem, from the same product and projects. He said, “We have 24000 automated tests, still our customers have problems and we do not find their problems!”. I was a bit surprised that the persistence in following the same strategy for automation while at the same time expecting a different outcome.

In a recent argument with a development manager and Continuous Deployment enthusiast. They explained their strategy and emphasis on automation. They put little focus on testing and exploration. Mostly hiring developers who needed to automate tests (or rather checks). I asked how they do their test design? How do they know what they need to test? One of my arguments was that they limited their test effort based on what could be automated.

We know that there is an infinite amount of tests. If you have done some research, you have an idea what different stakeholders value and what they are afraid will happen. If that is so, then you have an idea what tests would be valuable to do or which areas you wish to explore. Out of all those tests, you probably only want to run part of these tests only once, where you want to investigate something that might be a problem, learn more about the systems behavior or try a specific, very difficult setup or configuration of the system. This is not something that you would want to automate because it is too costly and it is enough to learn about it just once, as far as you know. There are probably other tests that you want to repeat, but most probably with variation in new dimensions, and do more often. It could be tests that focus on specific risks or functionality that must work at all times. Out of all those that you actually want to test several times, a part of those you plan and want to automate. Out of those that you have planned to automate, only a fraction can be automated. Since automation takes a long time and is difficult, you have probably only automated a small part of those.

If you are a stakeholder, how can you consider this to be ok?

Rikard Edgren visualized the concept of what is important and what you should be in focus in a blog post called “In search of the potato” [see reference 3].

His main points are that the valuable and important is not only in the specification or requirements, you need to go beyond that.

Another explanation around the same concept of the potato is that of mapping the information space by knowns and unknowns.

The majority of test automation focus on checking an aspect of the system. You probably want to make repeatable tests on things that you know or think you know, thus the Known Knowns. In making this repeatable checking you will probably save time in finding things that you thought you knew, but that might change over time by evolving the system, thus evaluating the Unknown Knowns. In this area you can specify what you expect, would a correct result would be. With limitation on the Oracle problem, more on that below.

If you are looking beyond the specification and the explicit, you will identify things that you want to explore and want to learn more about. Areas for exploration, specific risks or just an idea you wish to understand. This is the Known Unknowns. You cannot clearly state your expectations before investigating here. You cannot, for the most part, automate the Known Unknowns.

While exploring/testing, while checking or while doing anything with the system, you will find new things that no one so far had thought of, thus things that fall into the Unknown Unknowns. Through serendipity you find something surprisingly valuable. You rarely automate serendipity.

You most probably dwell in the known areas for test automation. Would it be ok to ignore things that are valuable that you do not know of until you have spent enough time testing or exploring?

The Oracle Problem
A problem that is probably unsolvable, is that there are none (or at least very few) perfect or true oracles [see reference 4, 5, 6].

A “True oracle” faithfully reproduces all relevant results for a SUT using independent platform, algorithms, processes, compilers, code, etc. The same values are fed to the SUT and the Oracle for results comparison. The Oracle for an algorithm or subroutine can be straightforward enough for this type of oracle to be considered. The sin() function, for example, can be implemented separately using different algorithms and the results compared to exhaustively test the results (assuming the availability of sufficient machine cycles). For a given test case all values input to the SUT are verified to be “correct” using the Oracle’s separate algorithm. The less the SUT has in common with the Oracle, the more confidence in the correctness of the results (since common hardware, compilers, operating systems, algorithms, etc., may inject errors that effect both the SUT and Oracle the same way). Test cases employing a true oracle are usually limited by available machine time and system resources.
Quote from Douglas Hoffman in A taxonomy of Test Oracles [see reference 6].

Here is a the traditional view of a system under test is like the figure 1 below.

In reality, the situation is much more complex, see figure 2 below.

This means that we might have a rough idea about the initial state and the test inputs, but not full control of all surrounding states and inputs. We get a result of a test that can only give an indication that something is somewhat right or correct. The thing we check can be correct, but everything around it that we do not check or verify can be utterly wrong.

So when we are saying that we want to automate everything, we are also saying that we put our trust in something that is lacking perfect oracles.

With this in mind, do we want our end-users to get a system that could work sometimes?

Spec Checking and Bug Blindness

In an article from 2011, Ian McCowatt expresses his view on A Universe of behavior connected to Needed, Implemented and Specified based on the book Software Testing: A Craftsman’s Approach” by Paul Jorgensen.

For automation, I would expect that focus would be on area 5 and 6. But what about unimplemented specifications in area 2 and 3? Or unfullfilled needs in area 1 and 2? Or unexpected behaviors in area 4 and 7? Partly undesired behaviors will be covered in area 6 and 7, but enough?

As a stakeholders, do you think it is ok to limit the overall test effort to where automation is possible?

Concluding thoughts

It seems like we have been repeating the same things for a long time. This article is for those of you who are still fighting battles against strategies for testing which state automate everything.

References

  1. Test Automation Snake Oil, by James Bach – http://www.satisfice.com/articles/test_automation_snake_oil.pdf
  2. Testing and Checking Refined, by James Bach & Michael Bolton – http://www.satisfice.com/blog/archives/856
  3. In search of the potato, by Rikard Edgren – http://thetesteye.com/blog/2009/12/in-search-of-the-potato/
  4. The Oracle Problem and the Teaching of Software Testing, by Cem Kaner – http://kaner.com/?p=190
  5. On testing nontestable programs, by ELAINE J. WEYUKER – http://www.testingeducation.org/BBST/foundations/Weyuker_ontestingnontestable.pdf
  6. A Taxonomy for Test Oracles, by Douglass Hoffman – http://www.softwarequalitymethods.com/Papers/OracleTax.pdf
  7. Spec Checking and Bug Blindness, by Ian McCowatt – http://exploringuncertainty.com/blog/archives/253
Leave a Reply