Data Masking

Data Masking — What It Is And How It Works

August 23, 2022
Data Masking — What It Is And How It Works

With data privacy regulations set to cover around 65% of the global population by 2023, many enterprises are prioritizing data security measures to keep sensitive data safe, and avoid the financial and reputational penalties associated with a data breach. At the same time, businesses need to stay competitive by making the optimum use of data in product development, testing, and analytics. They need data privacy and data usability, but can they really have both? 

With the right data masking solution in place, the answer is most definitely, yes.

What Is Data Masking?

Data masking is a way of creating realistic, structurally similar, and usable organizational data to prevent actual data being exposed or breached. By doing this, authentic data is ‘masked’ by inauthentic data. This is also known as data obfuscation.

With data masking, the format of the data remains unchanged, whilst the true values of sensitive information are hidden and replaced with plausible alternatives. There are a number of ways to mask sensitive data, from replacement through to randomization.  

It is important to note that however the data is masked, in order to keep it safe, the changes must be done in such a way so that they cannot be reverse engineered, or easily identified. At the same time, the data needs to be masked consistently across the organization.

Substituting the real data in this way not only keeps it safe, but also it provides organizations with a functional and usable alternative. Data masking is vital for any organization requiring realistic data for non-production environments such as training programs, and software testing. 

Why Is Data Masking Important?

As part of a wider package of data privacy measures, data masking is important because it mitigates the risks associated with accidental exposure, a data breach, or cyberattack. If the inauthentic information is leaked, the real data remains hidden and protected. In addition, data masking renders the data meaningless and of no value to a cyberattacker. At the same time, the data is still consistent and usable for the organization itself. 

By deploying data masking, organizations greatly mitigate the risk of accidental exposure through sharing information, both internally and with third-parties. This is particularly important for collaborative or outsourced projects, where there is potential for sensitive data to be stolen and misused. Data masking plays a significant role in alleviating concerns about data privacy and enables organizations to trust that their sensitive data is safe. 

Data masking is also a valuable tool for organizations looking to maintain a competitive advantage. Robust data privacy helps build trust with consumers and it’s also a key piece in ensuring organizations maintain compliance with ever more stringent data privacy laws and regulations, such as the General Data Protection Regulation (GDPR).  

Which Data Needs Masking?

The data you need to mask partly depends on your industry, the type of data you hold, and how you plan to use it. Broadly speaking, the kinds of data that typically require masking fall into one of four categories: 

  • Intellectual Property (IP): From business plans to a new invention, it’s essential that valuable creations of the mind are kept safe with strict access controls and protection from theft.
  • Payment Card Industry Security Standard (PCI DSS): Any organization handling debit and credit card transactions needs to ensure cardholder data is protected and secure from theft and exposure. 
  • Personally Identifiable Information (PII): Data which can be used to identify a specific individual, whether in isolation or in combination with other data, must be securely safeguarded. 
  • Protected Health Information (PHI): Also known as protected health information, PHI refers to all the data around a person’s healthcare record, including health conditions, medical history, and test results. 

Types Of Data Masking

There are three main types of data masking: dynamic, on the fly, and static. 

The problem with static data masking is that it relies on a duplicated dataset. This dataset needs to be extracted, masked, and then made available in its revised form. Duplicating a database not only uses up valuable storage space, it also means there is no single source of truth. This increases the likelihood that users might end up working with an out-of-date dataset, accidentally overwrite real values, or gain unauthorized access to sensitive information.  

On-the-fly data masking works a bit differently and is well-suited to organizations who need to keep sensitive information secure on its journey between different environments. Sensitive data is modified and masked as it migrates between systems. As this approach typically masks only a smaller subset of data, it avoids the need to duplicate an entire database. For this reason, it’s commonly used in Agile software development.

With dynamic data masking, sensitive information is masked upon access based on pre-defined authorization levels. This is beneficial for three reasons: 

1. Masked data is not copied back to the source, as it works on a read-only basis.

2. There’s no need for a duplicate dataset to be created

3. Sensitive data is dynamically masked based on a user’s access rights. 

Techniques For Masking Data

Depending on the data masking software you deploy, there are a range of different techniques to protect your data. Some techniques are more effective than others:

  • Encryption: An encryption key is required to unmask masked data. In it’s encrypted form, the data is unreadable. If the key is compromised, the data is at risk. 
  • Scrambling: A basic and less secure technique for randomizing real data by changing the order of numbers and letters to conceal the original content.  
  • Substitution: By substituting original values with alternatives, the original data remains hidden, and the replacement values retain the same look and feel.
  • Variance: This masking technique changes dates and values within a dataset by the same amount, so it’s only effective until a user discovers the variance.  
  • Pseudonym: Switching out real information about a person’s identity for an alias is another way to keep it safe. 
  • Redaction: Sensitive data which is not necessary for development, testing, or training purposes can simply be removed but will no longer be usable. 
  • Averaging: Numerical information, such as financial data, can be replaced by an average. For example, a list of figures might be replaced by the mean average, masking individual values but preserving the total.  

Masking Unstructured Data

One area where organizations typically struggle to protect sensitive data is when it’s held in unstructured formats, such as MS Office documents, PDF files, and even image file types such as JPEGs. That’s a huge concern because around 80% of sensitive organizational data is actually unstructured, and in many cases unprotected. 

Created by ABMartin, UMask was the first software solution to mask unstructured data, protecting sensitive data, whilst complementing existing structured data privacy solutions. As a leading data masking tool, UMask gives enterprises peace of mind and the security they need to continue using important data for business critical operations such as development, testing, and analytics.  

To find out how ABMartin can help secure your enterprise’s structured and unstructured data, get in touch today.


Get data privacy news direct to your inbox

Stay up to date with the latest news on data privacy and data masking from ABMartin.
Thank you for subscribing to the ABMartin newsletter.
Oops! Something went wrong while submitting the form.