Lightweight Reliability Testing Rikard Edgren

The big drawback and big advantage with reliability testing is that it is easiest and most effective to perform together with other testing. A separate automated reliability regression test suite could cost an awful lot to implement, but reliability in your spine when performing any type of manual test, together with deviations, is cheap, interesting, and powerful.

If you look at Reliability from a standards perspective, you will see a lot of measurement methods like Mean Time Between Failures. You don’t need to use these. You can test and find important information anyway.
The most lightweight method is to ask these questions to heavy users of the product:

Reliability. Does the product work well all the time?

Stability. Are you experiencing (un)reproducible crashes?
Robustness. Are there any parts of the product that are fragile and have problems with mis-configurations or corner cases?
Recoverability. Is it possible/easy to recover after (provoked) fatal errors?
Resource Usage. What does the CPU, RAM, disk drive usage look like?
Data Integrity. Are all sorts of data kept intact in the system?
Safety. Is it possible to destroy something by (mis)usage of the system?
Disaster Recovery. What if something really, really bad happens?
Trustworthiness. Do you feel you can trust the system?

You may also want to perform some specific tests aiming at the different sub-categories.

Stability. Run the product for a long time, without restarts.
Automate a simplistic scenario and run it thousands of times in a sequence.
Count the number of non-reproducible crashes per day that happens to your team.
Try really hard to reproduce the non-reproducible issues.

Robustness. Provoking error messages is fun, and don’t forget to check spelling and if error message helps.
When this is important: hit hard, hit many times.

Recoverability. Turn off the power for machines performing important things, restart and look at behavior.
Whenever an error occurs, try to recover, and consider if it is easy and intuitive.

Resource Usage. Look at system resources now and then.
Stress the system in various ways (but only spend time on this if the project is interested in results.)

Data Integrity. Use all types of data (numeric, strings, out-of-range, invalid, empty, Unicode), in different sizes (small, medium, large) on different systems (localized OS, regional settings, different fonts) through all parts of the system.

Safety. Do thorough brainstorming around scenarios where people can get hurt. Be aware that ambiguous or missing information can be very dangerous if they affect important decisions.

Disaster Recovery. You probably don’t want to test this for real. But you can ask developers or others if there are possibilities of continuing using the software after a crucial machine has disappeared. This is one of those characteristics that either is irrelevant, or very important.

Trustworthiness. Note down all inconsistencies in behavior, or moments when you are unsure what the product is up to.
Tell the project how you feel about the product’s reliability.

The best start to get reliable software is to have really solid code, and properly customized code checker tools can help you with this.

I guess the list could be a bit longer and still be lightweight; feel free to help me out!

Comments are closed.