Scripting Your Test Data Rikard Edgren

Sometimes I wonder if testers know how easy it is to script your own variations of test data.
I prefer Ruby, and you can download this example that I will tell you about.

I was testing healthcare data and wanted to see what the performance was for larger quantities of data. We had a mock service, and the data to put there was easy to create with a script.
For the “diagnosis” area, I had an Excel sheet with the 12441 possible diagnosis codes according to ICD-10-SE. I couldn’t resist creating a test patient that had all of these diagnosis.
This will never, never happen in reality, and does not add value to the performance tests, but I did it anyway, it was fun and fast.

After the performance tests where completed I continued using the test data I had created.
It is a kind of background complexity that isn’t really necessary, but doesn’t cost a lot, and might help you discover new things. And of course it did also this time (hey, I chose the example).
When testing search functionality I saw behaviors I hadn’t seen with the more simplistic data I had elsewhere. The large variety of diagnosis names gave possibilities for the search function to go wrong.

If you aren’t already doing stuff like this, feel free to edit the Ruby script to match your needs (most data files are text in some kind and can be scripted in this way) to create more variety to your test data.

Quite often, your tests aren’t a lot better than your test data.

3 Comments
Shahin March 11th, 2017

Interesting topic of discussion. I actually take a different approach to the test data that I use. I tend to create a test data method which creates random data for my test (assuming the data can be consumed in a correct format). This way, every time I run my tests, they run using different data therefore increasing the chance to catch possible issues using the same test, covering the same functional area. What do you think, would you prefer a large pool of data for a test or a test with data that changes every time it runs?

Rikard Edgren March 11th, 2017

Hi Shahin

In general for automation, I prefer to randomise the data as you (unless specific data is needed for your test).
Increases coversge over time, which of course is welcomed.
In this case however, I knew what the data could/should contain, so I created all possibilities.
This Brute Force Heuristic is sometimes underrated, it might be faster to test all options than to spend time picking out the seemingly best.

Shahin March 20th, 2017

Hi Rikard,

Fair point, I see the value in covering permutable areas of a test through a data set.

I do also see both methods yielding results depending on how you use them.

Best,
Shahin
(http://www.thetestroom.com)

Leave a Reply