Digital Developer Conference: Hybrid Cloud 2021. On Sep 21, gain free hybrid cloud skills from experts and partners. Register now

Protect your data using data privacy features

In recent years, there has been a resurgence of privacy regulations worldwide. Enterprises are struggling to comply with complex regulatory requirements regarding data privacy, as well as individuals’ demand for the privacy of their data.

There are serious consequences of improper protection of data. Data breaches and non-compliance of regulatory requirements can cause fines and penalties to be imposed on the enterprises. In addition, they may result in a loss of customer loyalty, loss of revenue, lawsuits and can damage the brand of the enterprise.

In this tutorial, you will learn to protect enterprise data using data privacy features, such as data protection rules within the Watson Knowledge Catalog on IBM Cloud Pak for Data.

Learning objectives

In this tutorial, you will:

  • Add data protection rules that enforce the restrictions specified in the policies and governance rules you previously imported.
  • Publish data to a governed catalog.
  • Log in as various users to verify that the data protection rules are enforced.

Prerequisites

Estimated time

This tutorial will take approximately 60 minutes to complete.

Step 1. Add data protection rule to deny access

Start by creating a data protection rule that prevents the users specified in the rule from viewing data.

You will create a data protection rule to enforce the “Restrict access for Passport and Driver’s License number” governance rule, which says to restrict visibility of assets with these fields to certain users (in this case, restricted_user).

  • Log into your IBM Cloud Pak for Data instance. CPD login

  • Go to the hamburger (☰) menu in the upper-left corner, expand Governance, and click on Rules. Hamburger menu - rules

  • Click Add rule > New rule. Rules - add new

  • Click on Data protection rule. Rules - data protection rule

  • Provide a name for the rule (Restrict access – Passport and Drivers License), select the type of the rule as Access, and provide an optional description for the rule. On the right side, create the first portion of Condition 1 by selecting and typing out the values to form the condition “If Data class contains any Passport.” Click on the + sign to add an additional section within the condition. Choose OR to join the two sections, and update the second section by typing and selecting the values to form “Business term contains any Patient Driver's License.” This condition states that the rule should be run if the data class of an object is Passport or if the business term of the object is Patient Driver’s License. Rules - dp rule deny access - 1

  • Click on Add new condition +.

  • A new condition (Condition 2) is added. Choose AND to join the two conditions and update Condition 2 by typing and selecting the values to form “If User name contains any restricted_user. This condition states that the rule should be run if the user that is trying to access the data is the restricted_user. Finally, choose the Action to be taken as deny access to data and click Create rule at the top of the screen. Rules - dp rule deny access - 2

NOTE: restricted_user represents a user that exists in your system, whom you wish to deny access to any data that has the Passport data class or the Patient Driver’s License business term.

  • The rule is saved and is displayed on the screen. Rules - dp rule deny access saved

Step 2. Add data protection rule to redact data

Next, you will see how data can be redacted using data protection rules. This method replaces the data in the column with a string of exactly 10 X’s. While this method helps in hiding the data, it does not retain the original format of the data. Since all values are replaced with 10 X’s, it also loses referential integrity of the data, so if the column was used as a foreign key reference to some other table, that foreign key reference will be lost when the data is redacted.

You will create a data protection rule to enforce the Redact Patient Birthdate governance rule, which says that patient birthdate should be kept hidden.

  • Like before, go to the hamburger (☰) menu in the upper-left corner, expand Governance, and click on Rules. Hamburger menu - rules

  • Click on Add rule > New rule. Rules - add new

  • Click on Data protection rule. Rules - data protection rule

  • Provide a name for the rule (Redact the Birthdate of Patient), select the type of the rule as Access, and provide an optional description for the rule. On the right side, create Condition 1 by selecting and typing out the values to form the condition “If Business term contains any Patient Birth Date.” Under Action, click on deny access to data to open the drop-down menu and click mask data. Rules - dp rule redact - 1

  • New fields are displayed on the screen. Update the Action by typing and selecting the values to form “then mask data in columns containing Business term Patient Birth Date.” Under Select how to mask data choose Redact. Hovering over the value within the Redact box will show you what the value would look like after masking. Click Create rule at the top of the screen. Rules - dp rule redact - 2

  • The rule is saved and is displayed on the screen. Rules - dp rule redact saved

Step 3. Add data protection rule to substitute data

See how data can be substituted using data protection rules. This method replaces data with values that do not match the original format. However, if a value is used several times in a column with substituted data, it is replaced with the same substitution value. Thus, this method of masking data does not keep the data in the original format, but preserves referential integrity of the data.

You will create a data protection rule to enforce the “Mask Sensitive personal information” governance rule, which says to mask sensitive personal information such as race, ethnicity, or gender.

  • Like before, go to the hamburger (☰) menu in the upper-left corner, expand Governance, and click on Rules. Hamburger menu - rules

  • Click on Add rule > New rule. Rules - add new

  • Click on Data protection rule. Rules - data protection rule

  • Provide a name for the rule (Hide Sensitive Personal Information), select the type of the rule as Access, and provide an optional description for the rule. On the right side, create Condition 1 by selecting and typing out the values to form the condition “If Business term contains any Patient Race Patient Ethnicity Patient Gender.” Under Action, click on deny access to data to open the drop-down menu and click mask data. Rules - dp rule substitute - 1

  • New fields are displayed on the screen. Update the Action by typing and selecting the values to form “then mask data in columns containing Business term Patient Race Patient Ethnicity Patient Gender“. Under Select how to mask data choose Substitute. Hovering over the value within the Substitute box will show you what the value would look like after masking. Click Create rule at the top of the screen. Rules - dp rule substitute - 2

  • The rule is saved and is displayed on the screen. Rules - dp rule substitute saved

Step 4. Add data protection rule to obfuscate data

Next, see how data can be obfuscated using data protection rules. This method replaces data with similarly formatted values. However, it does not preserve referential integrity or data distribution. Thus, this is a good method of masking data such as financial account information like credit card numbers and bank account numbers, government identity documents such as passport numbers and Social Security Numbers, or contact details like phone numbers and email addresses.

You will create a data protection rule to enforce the “Mask Social Security Number” governance rule, which says that the Social Security Number should be replaced with dummy values.

  • Like before, go to the hamburger (☰) menu in the upper-left corner, expand Governance, and click on Rules. Hamburger menu - rules

  • Click on Add rule > New rule. Rules - add new

  • Click on Data protection rule. Rules - data protection rule

  • Provide a name for the rule (Hide Social Security Number), select the type of the rule as Access, and provide an optional description for the rule. On the right side, create Condition 1 by selecting and typing out the values to form the condition “If Data class contains any US Social Security Number.” Under Action, click on deny access to data to open the drop-down menu and click on mask data. Rules - dp rule obfuscate - 1

  • New fields are displayed on the screen. Update the Action by typing and selecting the values to form “then mask data in columns containing Data class US Social Security Number.” Under Select how to mask data, choose Obfuscate. Hovering over and moving away from the value within the Obfuscate box, you can see that the data format for both the Before and After values will be the same after masking. Click Create rule at the top of the screen. Rules - dp rule obfuscate - 2

  • The rule is saved and is displayed on the screen. Rules - dp rule obfuscate saved

Step 5. Publish assets to the default catalog

You have now completed the steps to discover and analyze your assets, and you have also incorporated rules to protect your data. Now, you can publish the assets to a catalog in order to make these assets available to other users.

  • Go to the hamburger (☰) menu in the upper-left corner, expand Governance and click on Data quality. Click on the tile for your project (HealthcareAnalysis). Go to project

  • Select all the assets and click on Publish +. Publish DQ assets

  • In the pop-up window, click Publish. Publish DQ assets - confirm

  • The assets will be published to the default catalog. You can click on the Refresh icon to refresh the table, and you should see that the last published date for the assets is updated. Published date update

  • Go to the hamburger (☰) menu in the upper-left corner, expand Catalogs and click on All catalogs. Go to catalog

  • Click on the tile for Default Catalog. Go to default catalog

  • The assets should now be available within the catalog. Catalog assets

Step 6. Add collaborators to the default catalog

By default, the default catalog can only be accessed by the admin user. You will need to add other users as collaborators to the catalog to provide them with access to the catalog and the assets within the catalog.

  • Go to the Access control tab and click on Add collaborator +. Catalog - add collaborators

  • In the pop-up window, choose the role you wish to provide the new users (Viewer should suffice for this tutorial). Under Collaborators, search for and select the users you want to add as collaborators to the default catalog, then click Add. Catalog - add collab - add

NOTE: You will need to add a minimum of two non-admin users to the catalog, one of which was specified in the rule defined in Step 1: Add data protection rule to deny access.

  • The newly added users should now be displayed in the list of collaborators on the Access control tab of the default catalog. Catalog - collabs added

Step 7. Verify that the data protection rules are enforced

You can now log in as the non-admin users that were provided access to the default catalog and verify if the data protection rules were enforced.

  • Log out of IBM Cloud Pak for Data and log back in as restricted_user, the user specified in the rule defined in Step 1: Add data protection rule to deny access.

  • Go to the hamburger (☰) menu in the upper-left corner, expand Catalogs and click on All catalogs. User - go to catalog

  • Click on the tile for Default Catalog. User - go to default catalog

  • Scroll to the bottom of the page to see the catalog assets. Click on PATIENTS asset. User - catalog - PATIENTS

  • You should see an error that the user cannot view the PATIENTS asset as it is blocked by the Restrict access Passport and Drivers License data protection rule. User - cannot access PATIENTS

  • If you go back to the Default Catalog using the breadcrumbs at the top and try to access some other asset, say ENCOUNTERS, you should be able to see the contents of that asset. This is because the user is only denied access to the assets containing Driver’s License or Passport fields, and none of them exist in the ENCOUNTERS asset. User - can access ENCOUNTERS

  • Log out of IBM Cloud Pak for Data again and log back in as regular_user, the other user that was added to the default catalog as a collaborator.

  • As before, go to the hamburger (☰) menu in the upper-left corner, expand Catalogs and click on All catalogs. User - go to catalog

  • Click on the tile for Default Catalog. User - go to default catalog

  • Scroll to the bottom of the page to see the catalog assets. Click on PATIENTS asset. User - catalog - PATIENTS

  • This time around, the PATIENTS asset should be loaded on the screen as this user was not denied access to data. Go to the Asset tab. The asset preview will be loaded on the screen. It may take a while for the data to be masked, in which case, you will see a notification about the same. You can see on the screen that five columns have been masked. Clicking the Lock icon will provide more details about the masked columns: one column is obfuscated, three columns are substituted, and one column is redacted. User - catalog - masked PATIENTS - 1

  • Look at the BIRTHDATE and SSN columns. Both these have a lock icon next to the column name. This indicates that these columns have been masked. As before, clicking on the lock icons will give you more details about the masked column. For example, the lock icon near BIRTHDATE says that the values and format in this column are redacted by the “Redact the Birthdate of Patient” data enforcement rule. You can see that all the BIRTHDATE values are replaced by a string of 10 X’s. For SSN, the values are obfuscated — that is, they are replaced with other values of the same format. Thus, the real SSN values are hidden, but the formatting of the field is preserved. User - catalog - masked PATIENTS - 2

  • Scroll to the right, and look at the RACE, ETHNICITY, and GENDER columns. The data in these columns has been substituted with values that do not match the original format of the fields. However, you can see that within each of these columns, the same values occur multiple times. This is because all occurrences of a value in the column are replaced with the same text. This preserves the referential integrity of the column. User - catalog - masked PATIENTS - 3

Summary

In this tutorial, you have learned to use data protection rules within Watson Knowledge Catalog on IBM Cloud Pak for Data. You added data protection rules to your healthcare data to limit the availability of the data to certain users. You added other data protection rules to hide some data by replacing it with some other data. This ensured that while the real data is not visible to users, they can view the columns that exist and based on the type of data masking used, they can also get an idea about the formats or table references of those columns. Finally, you verified that the data protection rules are enforced by publishing the data assets to the default catalog, and verifying that you have access to the assets and the data by logging in as other users who are not owners of the data asset.

This tutorial is part of the An introduction to the DataOps discipline series. In this series, you have seen how you can set up governance artifacts for your data, discover the data in your data sources, analyze the quality of your data and protect your data using data privacy features.