Data governance tutorial

Create rules for data masking and obfuscation

Sharath Kumar RK,

Arpit Nanavati

Data governance and security are bigger challenges when data spans an on-premises/cloud infrastructure. In this tutorial, you learn how to protect sensitive data using data masking rules -- specifically, how to create rules and policies for masking email addresses and last names, and how to attach rules to a data asset.

Prerequisites

To complete this tutorial, you need to have the following installed on your system:

IBM Cloud Pak for Data
IBM Watson Knowledge Catalog

Estimated time

It should take you about 30 minutes to complete this tutorial.

Steps

The following steps show you how to create rules for data masking and obfuscating.

Click on the navigation menu, then expand Governance and click Rules.
Click Add rule, then New rule.

Select Data protection rule.
Add a rule for masking email addresses. Specify Email Address under Data Class and redact the data as shown in image below.

After the rule has been created, you should see it on the Rules page.
Create a rule for last name obfuscation.

Select Data protection rule.

Specify Last Name under Data Class and obfuscate the data as shown in image below.

After the rule has been created, you should see it on the Rules page as well.
In the navigation menu, expand Governance and then click Categories to create a category.

Click Add category.

Give the category a name (such as "PII_CATEGORY").

After the category has been created, you should be redirected to the newly created category page.
In the navigation menu, expand Governance and then click Business terms to create a new business term.

Click Add business term.

Give a name to the business term and attach a category to it.

Click Publish to publish the business term.
In the navigation menu, expand Governance and then click Policies to create a new policy.

Create the policy and specify PII_CATEGORY as the primary category.
Add both rules in the Data protection rules section of the policy, add the business terms, and then publish the policy.
Go back to catalog, open the reshaped asset, and click on the plus (+) icon in the Governance artifacts -> Business terms section, as shown in the image below.

Select Pii_Business_Term and add it to the catalog.

Verify that the business term has been added to the catalog.
Now, return to the catalog that you created and click the Access control tab, then click Add collaborators +.
Add the datascientist user as collaborator with the Viewer role.

With the above steps, you successfully created data masking and obfuscation rules to protect sensitive data. Also, with the last step, you provided access of governed data to data scientist. Now let's do a few more steps to see whether the data scientist received the masked data or not.
Login using datascientist or your developer credentials to see the masked email address and obfuscated last name.

Summary

In this tutorial, you learned how to create data protection rules and apply them to assets using IBM Watson Knowledge Catalog.

In this section of the learning path, which included multiple tutorials, you learned how to create a connection between IBM Cloud Pak for Data and external data sources and how to ingest data from those data sources. You also learned different strategies for integrating data using Data Virtualization and IBM DataStage, as well as how to clean and reshape the data using IBM Data Refinery Flow and how to mask data using IBM Watson Knowledge Studio.

In the next section, we show you how to build predictive models in Amazon SageMaker and port the notebooks into IBM Cloud Pak for Data.