If you want to have great tests, you need great test data. That’s why it’s so important to know about the test data management tools at your disposal and how they can help you. That’s what this post is all about.
We’ll start with the basics by defining test data management. Then, we’ll explain why test data management is important and why you need it. Finally, we’ll walk you through our list of five test data management tools. For each tool, you’ll read its description along with its main pros and cons. Let’s get to it.
Test Data Management Tools 101
This post features a spin on the popular what-why-how structure. Call it a “what-why-list,” if you will. We start with the “what,” defining test data management. Then, we follow that with the “why,” covering the reasons why you should care about test data management. After that, we walk you through our list of test data management tools.
What Is Test Data Management In Software Testing?
If you have tests, you need test data. When testing in production —something you should be doing, if you’re not—you already have access to the best possible type of test data: Real data. However, testing in production doesn’t preclude testing before production, and that’s where test data management comes in handy.
Test Data Management (TDM) is the process of providing automated tests the data they need. The TDM process has to ensure the availability of test data, making sure test cases have access to the data in the right amounts, formats, and timing.
What Is the Role of Test Data Management Important?
Data has to meet several criteria if it is to be used effectively as test data. The TDM process is responsible for providing the test cases with data that meet those requirements. So, what are those qualities?
Properties of Good Test Data
Test data has to be of high quality, available, timely, realistic, and compliant.
Let’s start with quality. If you feed your test cases with poor quality data, don’t act surprised when you get less than stellar results. Keep in mind that “quality” in this context means that the data has to meet the expectations of the tests. For instance, while the test cases need valid data, sometimes you need invalid data. A classic example is when performing tests to see how the system reacts to invalid or unwanted input —i.e., negative testing. Test data also needs to be available. It’s no use having great test data if the tests can’t access it, for whatever reason—e.g., authentication issues. Also, test data needs to be timely. That is to say; it has to be accessible to the tests as soon as they require it, without delays.
Test data also needs to be realistic. What I mean by that is the data sets need to mimic as closely as possible what real production data looks like. Realist test data will contain data that faithfully resemble real data regarding quantity, formats, and more.
Finally, test data has to be compliant. Since some techniques to obtain test data—such as production cloning—rely on production data, organizations need to be really cautious not to disobey relevant privacy regulations, such as GDRP and others. This adds yet another layer of complexity to the responsibilities of the TDM process.
Automating With TDM
As you’ve just seen, test data needs to meet many requirements to be used effectively and safely in a test strategy. On a small enough scale, you might be able to get away with doing all of this manually. However, as your organization’s testing needs start to grow, it quickly becomes overwhelming to manage the required data without help.
That’s where test data management can make a difference. With the right tools, you can ensure the test cases in your organization get access to the data they need, enabling your testing strategy to go smoothly.
What Are The Three Types of Test Data?
Test data can exist in an array of shapes and sizes. It’s possible to categorize test data according to several different factors. For instance:
- Origin. You can obtain test data by copying it from production (and, of course, masking it), synthetically generating it, or some combination of the two.
- Manual vs automatic creation. Synthetic data can get generated automatically or created manually.
- Obfuscated vs. non-obfuscated data. Test data may or not be obfuscated in order to protect user privacy (you must, of course, obfuscate data you obtain via production cloning.)
However, a perhaps more useful approach to categorize test data is to look at the values of the data themselves and analyze their properties. That way, we can categorize test data into three buckets: valid data, invalid data, and extreme data.
As the name suggests, valid data refers to test data that adheres to the formats, values, and quantities expected by the system under test. Valid data is useful when testing the happy path—i.e. what happens when no errors or unexpected incidents occur.
Unfortunately, we can’t just test the happy path and call it a day. Testing the unhappy path is even more important because if you want to have a robust application, it’s crucial to understand how it reacts during unexpected scenarios. That’s where invalid data really comes in handy. Using techniques such as chaos testing, you can throw deliberate bad data at your application to verify how graciously it can handle troubled waters.
Extreme data is also called boundary data. It refers to data that lives at the boundaries of what’s considered valid. For instance, let’s say your web application contains a field that should only accept values ranging from 0 to a thousand, inclusive. Boundary values for this field would be 0 and 1000, and you’d want to make sure such boundaries are tested because this is a common “spot” for errors in web applications.
Is extreme data just a variation of valid data, though? Actually, it can be both valid and invalid. It’s also important to test outside the boundaries, to verify whether the application can handle the unhappy scenarios just as properly. In this context, extreme data is a form of invalid data.
Test Data Management: Challenges and Pitfalls
Test data management is beneficial for software organizations, but that doesn’t mean it’s easy. Quite the contrary: there are some relevant obstacles and challenges you might face, so it’s vital to be aware of some of them.
Production Cloning Might Be Slow Or Prohibitively Expensive
One of the most popular ways to obtain test data is by getting data from production servers and then applying some kind of obfuscation to it, for security and privacy concerns. However, this technique has some pitfalls associated with it. For starters, cloning production data can be super slow, particularly if you try to clone all of the data.
The best practice is to not do that, and instead grab a portion of the data, in a process called data slicing.
In the same spirit, production cloning can be a source of high spending, due to storage and infrastructure costs. Once again, the answer may lie in data slicing.
Using Unmasked or Unobfuscated Data
Using production cloning without masking or another form of data obfuscation is highly risky. Due to GDPR and other similar regulations, failure to safeguard user data can result in dire financial and legal consequences, not to mention the stain on the organization’s brand.
Masking Data Might Add Additional Overhead
As you’ve just seen, the use of unobfuscated production data carries a lot of risk for organizations. That doesn’t mean that applying masking or anonymization is free. Besides the financial cost that you may incur—when using paid masking tools—there are other costs associated with it, such as the learning curve for masking tools and the additional overhead caused by the masking process itself.
Last but not least, one of the biggest challenges when it comes to testing data is to ensure its availability. As you’ve seen, test data must be available, timely, valid, and realistic. Ensuring all of those requirements is often a big challenge.
The need for huge quantities of data might affect its availability. Costly processes such as data masking/data slicing might cause the data to not be as timely as one would expect. Test data also gets out of date, which means the organization must spend constant effort into renovating it.
Test Data Management Tools: The 5 That Should Be on Your Radar
Now that you know more about test data management in general, it’s time to cover our list of test data management tools.
As we said earlier, each tool will have a brief description, followed by some of its pros and cons. The tools are listed in alphabetical order.
GenRocket’s synthetic data platform creates a new category of Test Data Management (TDM) that we refer to as Synthetic Test Data Automation (TDA). This new and innovative approach automates and accelerates many cumbersome aspects of traditional TDM. It also removes the limitations of other synthetic data platforms that produce a synthetic data replica of a production database to provision test data.
Rather than being a single product, DATPROF is a suite of products catering to a different need in the overall TDM process. Among its main pros, we can mention:
- its centralized dashboard where users can manage test data in many environments from a single place
- the security and compliance capabilities
- a self-service portal where people who perform testing can request the test data they need
- support for a large number of database providers and many available integrations
- free trial available
What about the cons? DATPROF’s approach of splitting its features between several offerings while giving the consumer flexibility can become somewhat overwhelming, especially for newcomers.
Delphix provides a test data management platform that helps teams populate test environments with realistic and compliant test data. Among its advantages, we can cite its powerful masking capabilities, virtual test data provisioning, and version control for data sets. Delphix also integrates with several tools and data sources. Its interfaces are considered user-friendly and easy to use.
When it comes to the cons, complaints about the pricing are somewhat common since Delphix licenses their platform annually, depending on usage. Additionally, the data creation capabilities aren’t considered as strong as other features.
Informatica’s TDM offering has a focus on data quality and privacy. With strong masking and synthetic data generation capabilities, this tool provides organizations with automatic provisioning of test data to efficiently meet their testing needs. Informatica’s pros include its automatic provisioning capabilities, monitoring and reporting features, and comprehensive masking techniques. The tool also offers free trials, which makes it easier for organizations to evaluate.
Informatica’s price range has been cited as a con, besides the need for a friendlier interface.
Testim’s offering is an AI-powered tool to help teams author stable end-to-end testing. Even though it’s not a TDM tool in the traditional sense, Testim has capabilities that make it a viable option for data-driven testing.
Testim lets you define data sets in various ways: Through its visual editor, through a configuration file, and even from external sources, such as the database or files (CSV, Excel, and so on).
Test Data Management Tools: Know Them so You Can Leverage Them
Despite recognizing the benefits of test automation, many organizations still struggle to do it right. And, to be fair, it’s not entirely their fault. With each passing year, new terms and buzzwords related to testing appear. Similarly, new types of testing keep being invented. As a newcomer to this field, it’s hard not to feel overwhelmed.
This post was yet another contribution of ours to your automated testing learning journey. We talked about test data management and the main test data management tools you need to be aware of. We’ve also explained the concept of Test Data Management and why it’s important to have an effective software testing strategy.