Application testing is designed to identify potential bugs before they make it into production, where they can be much more costly (and embarrassing) to fix. One of the main challenges in application testing is balancing the need for good testing data and the need to protect sensitive company data from being breached within a testing environment. In test, the same vulnerabilities that the developers are looking for might allow a hacker who gains access to that environment to steal sensitive data while it is not adequately protected.
The need for good testing data that does not jeopardize company data security has led to many different efforts designed to create anonymous data that is usable for testing purposes. One of these, data masking, is a promising option for enabling realistic testing without endangering the organization’s data.
The Need for Good Testing Data
Application testing falls into two main categories. The first type of testing is stress testing, which is designed to identify potential failure cases that may occur. Stress testing deliberately uses unrealistic data since it is intended to find cases where the developer made assumptions and did not build in the error handling code to check their validity.
Commonly, these types of tests are designed to ensure that the software is secure and that a hacker can’t find an “invalid” input that would break the application. The other type of testing is intended to ensure that the application does its job properly under normal circumstances. If an application is provided “valid” input by a user, it should do its job properly even if the input is unusual and not expected by the developer. An example of a valid but unexpected input may be a name that legitimately contains special characters (like O’Connor).
This second type of testing requires data that matches the expected inputs to the application as closely as possible. Logically, the best way to ensure that testing data closely resembles real-world data is to actually use real-world data for testing. However, this approach to testing can run afoul of data protection requirements.
The Data Protection Landscape
In the last few years, the number and scope of data protection regulations has exploded. The EU’s General Data Protection Regulation is the most famous example of this, but many countries and US states have followed suit, releasing their own regulations to control how organizations can use their customers’ personal data.
These regulations dramatically expand the scope of what is considered data protected by regulations. Previously, many data protection regulations were primarily focused upon certain types of “high value” data, like credit card numbers and healthcare information. Under the GDPR, any data that could be used to uniquely identify an individual is protected under the new laws.
One significant impact of these regulations is that companies must be much more careful and transparent regarding how they use data entrusted to them. In the past, companies “owned” any data given to them and could use it for whatever they chose. The main limitation was that certain types of data must be protected at all times. However, the use of personal data for software testing was fine as long as it was protected appropriately from being breached.
Under regulations like the GDPR, organizations need to explicitly and clearly request permission for every use of customer data, including product testing. However, the regulation’s protections only apply to data that can be used to uniquely identify an individual. If data can be sufficiently anonymized, an organization is not limited in how they use it, including for application testing.
Data anonymization is not as easy as it sounds. Recent research found that many anonymization techniques in common use can be trivially reversed given enough details about the data subjects. However, for application testing, it is not necessary to have “real” data so much as “realistic” data.
Balancing Security and Functionality
For application testing, the test data needs to mimic real-world data as closely as possible. However, real consumer data is often protected by data privacy regulations. Organizations need realistic but fake data for use in test environments.
Data masking is a promising solution to this type of problem. Sitting between the company’s database and their test environment, a data masking solution applies a non-reversible transformation to data. Based upon the transformation algorithm applied, the resulting masked data can be more or less realistic. This adaptable realism of data makes it an ideal solution for application testing. As the masked data becomes more realistic, it may more closely resemble the actual data that it is based upon, making it necessary to protect it. However, as developers become more confident in the security and functionality of their software, they may be more willing to entrust sensitive data to it.
Using data masking, developers can implement a test environment where data becomes increasingly more realistic as the testing continues. This serves the dual purposes of ensuring that the organization’s sensitive data is appropriately protected while also providing robust and realistic test data to the application. Using data masking, a quality assurance team can have the best of both worlds: data realism and security.