Data governance tutorial

Create rules for data masking and obfuscation

By

Sharath Kumar RK,

Arpit Nanavati

Data governance and security are bigger challenges when data spans an on-premises/cloud infrastructure. In this tutorial, you learn how to protect sensitive data using data masking rules -- specifically, how to create rules and policies for masking email addresses and last names, and how to attach rules to a data asset.

Prerequisites

To complete this tutorial, you need to have the following installed on your system:

  • IBM Cloud Pak for Data
  • IBM Watson Knowledge Catalog

Estimated time

It should take you about 30 minutes to complete this tutorial.

Steps

The following steps show you how to create rules for data masking and obfuscating.

  1. Click on the navigation menu, then expand Governance and click Rules.

    Menu - Governance and Rules

  2. Click Add rule, then New rule.

    Select "New rule"

    Select Data protection rule.

    Select "Data protection rule"

  3. Add a rule for masking email addresses. Specify Email Address under Data Class and redact the data as shown in image below.

    Edit email masking rule

    After the rule has been created, you should see it on the Rules page.

    New rule created

  4. Create a rule for last name obfuscation.

    Create rule

    Select Data protection rule.

    Select "Data protection rule"

    Specify Last Name under Data Class and obfuscate the data as shown in image below.

    Obfuscate last name data

    After the rule has been created, you should see it on the Rules page as well.

    Obfuscate rule created

  5. In the navigation menu, expand Governance and then click Categories to create a category.

    Click "Categories"

    Click Add category.

    Click "Add category"

    Give the category a name (such as "PII_CATEGORY").

    Name the category

    After the category has been created, you should be redirected to the newly created category page.

    New category page

  6. In the navigation menu, expand Governance and then click Business terms to create a new business term.

    Create a new business term

    Click Add business term.

    Click "Add business term"

    Give a name to the business term and attach a category to it.

    Name the business term

    Click Publish to publish the business term.

    Publish the business term

  7. In the navigation menu, expand Governance and then click Policies to create a new policy.

    Select "Policies"

    Create the policy and specify PII_CATEGORY as the primary category.

    Specify primary category

  8. Add both rules in the Data protection rules section of the policy, add the business terms, and then publish the policy.

    Add data protection rules and publish

  9. Go back to catalog, open the reshaped asset, and click on the plus (+) icon in the Governance artifacts -> Business terms section, as shown in the image below.

    Click plus icon to add business term

    Select Pii_Business_Term and add it to the catalog.

    Add business term to the catalog

    Verify that the business term has been added to the catalog.

    Verify addition to catalog

  10. Now, return to the catalog that you created and click the Access control tab, then click Add collaborators +.

    Add collaborators

  11. Add the datascientist user as collaborator with the Viewer role.

    Assign viewer role to collaborator

    With the above steps, you successfully created data masking and obfuscation rules to protect sensitive data. Also, with the last step, you provided access of governed data to data scientist. Now let's do a few more steps to see whether the data scientist received the masked data or not.

  12. Login using datascientist or your developer credentials to see the masked email address and obfuscated last name.

    View results

Summary

In this tutorial, you learned how to create data protection rules and apply them to assets using IBM Watson Knowledge Catalog.

In this section of the learning path, which included multiple tutorials, you learned how to create a connection between IBM Cloud Pak for Data and external data sources and how to ingest data from those data sources. You also learned different strategies for integrating data using Data Virtualization and IBM DataStage, as well as how to clean and reshape the data using IBM Data Refinery Flow and how to mask data using IBM Watson Knowledge Studio.

In the next section, we show you how to build predictive models in Amazon SageMaker and port the notebooks into IBM Cloud Pak for Data.