Overview

Skill Level: Intermediate

Automated discovery streamlines the process to import, analyze, and classify data from new data connections. This step by step guide helps you to run this auto-discovery using Information Governance Catalog and Information Analyzer.

Ingredients

Prerequisites for Setting up automated discovery


The figure below shows the preparatory steps that an Enteprise should take before running an Auto-discovery for their critical data elements.

Prerequisite for Auto Discovery

Step-by-step

  1. Connect to the sample database

    Import Sample Database: 

    After installing IBM Information Server with Unified Governance (UG) Stack, do the following:

    1. Create a connection to the IADB repository.
    2. Set IADB Parameters.
    3. Set the engine credentials that are needed for Discover and Analysis.
    4. Import a sample database.

    Step 1: Create a connection to the IADB repository (IADB is the analysis database.).

    1. Click Metadata Asset Manager from the Launchpad.
    2. Click the Import tab and then click New Import Area.
    3. Specify iadb for Import Area Name.
    4. Assuming that you are using Db2 as your Information Analyzer (IA) database, click IBM InfoSphere DB2 Connector under IBM and then click the Next button.
    5. Click the Select data connection icon and then click New Data Connection. Specify iadb for Name, and enter the username and password for IA user for the selected IADB database.
    6. Click Test Connection to validate the information. If the validation is successful, select Save Password and click OK.
    7. Clear all seven import parameters that are selected and then click Next.
    8. For the Hosts system name field, click Select existing asset and Select Host. Click OK and then click Next.
    9. Click Express Import, then click Import, then click OK.
    10. Click Close.

    Step 2: Set analysis database (IADB) parameters.

    IADB is required to store the Analysis results.   Ensure that the IIUSER has data admin privilege. 

    $IISINSTALLDIR/ASBServer/bin/IAAdmin.sh -setIADBParams -iaDBHost IADBHOST -iaDBDataConnection iadb -iaDataSource jdbc/IADB -url https://$IISHOSTNAME:$IISPORT -user $IISUSER -password $IISPASSWORD
     
     

    Step 3: Set the engine credentials that are needed for automated discovery and analysis.

    1. Click Administration Console from the Launchpad.
    2. Click Administration tab and then click Engine Credentials under Domain Management.
    3. Select Information Server Engine and click Open Configuration and then provide the credentials for the InfoSphere DataStage¬ģ Administrator.
       

    Step 4: Import a sample database.

    Create a DB2 sample database as the user db2inst1.

    For the purpose of this tutorial we are using the sample db which comes with IIS install with Db2 repository. Give the following command to create the sample db.
     
    sudo su – db2inst1 -s /bin/bash -c “db2sampl”

     

    Create Connection to the above database:

    1. Login to IGC New UI using the UG launchpad (https://<UGServerName>/ibm/iis/igcui/).
    2. Go to Connections tab. Using this pane we can create JDBC connections to any source (like DB2, Oracle, SQL and Teradata). We can also create HDFS connections. For the tutorial we will create a connection to the sample db.
    3. Click ‚ÄúCreate connections‚ÄĚ on the top right and enter the details:

      Create Connection

       

      * Name: SampleDBConn
      * Description: Some description
      * Connection: Db2
      * JDBC URL: jdbc:db2://<IISHOSTNAME>:50000/Sample
      * Username: db2inst1
      * Password: <password of the db user>

    4. Click Test connection and see the message “Connection tested successfully.
    5. Click ‚ÄėSave connection‚Äô to save the connection.
  2. Triggering Auto Discovery

    a.     To run Auto discovery, go to the Connections tab in the IGC New and look for the connection that you created in the previous step (SampleDBConn) .

    b.¬†¬†¬†¬† Click the dotted menu on the connection tile, and then click ‚ÄėDiscover‚Äô.

    c.     Click the Browse button to select the asset to be discovered. You can select the root asset (db2) to run discovery on all the schemas and tables in the repository. In our example we are selecting the schema DB2INST1:

     

    Schema / Table Selection 

     

    d.     Select the discovery options (tasks that we want to run):

    e.     Analyze columns: To run column analysis on the data

    f.      Analyze data quality: To run data quality analysis on the data

    g.     Assign terms: To assign terms that exist in IGC

     

    Host will be auto populated. Select any host where you want to import the assets.

    Use a default workspace in Workspace option (UGDefaultWorkspace). This is the workspace where all the selected assets will get added and analysis would be run for them.

    Click Discover to see the following screen:

     Discovery Running

     

     

  3. Wait for Discovery to run

    Wait for the discovery results. This might take a few moments.

    First the IMAM will import the meta data. After some time, on refresh you will see that Import is Finished and now Analyze is running.

    IMAM Finshed

    Click refresh and in time the Analysis will also show state of Finished. 

    Screenshot-2019-05-06-at-2.56.40-PM

  4. Check Results

    Click the ‚ÄúEye‚ÄĚ icon on the right to review the discovery results. If you have navigated away from this page, then go the top tab of Connections > Discovery Results and look for the Discovery Result of your last run to return to this page.

    Screenshot-2019-05-06-at-2.55.35-PM

    Below is the discovery result:

    Screenshot-2019-05-06-at-2.39.32-PM

    Keep this screen handy as you will have to come back to this screen in Step 6.

  5. Check the IGC for Term called Department

    In the screen above, you see that the Asset Name Department is mapped to the Business Term Department. (In case you do not have a business term, it may not show up. You can create a Business Term using IGC New UI).
    Now let’s have a look at the term Department in IGC New UI.
    Select Catalog > Assets > All asset types (Glossary and Governance -> Term) and type Department.
    IGC

    Now look at the details of the Term Department:
    Screenshot-2019-05-06-at-2.42.32-PM

    *Note: Note that there is no asset assigned to this term Department.

  6. Publish the Results

    Now go to the screen in Step 4, and select Department as shown below:

    Discovery Result Screen

     

    Click on Publish button on the down right corner. You will get the following dialog:
    Screenshot-2019-05-06-at-2.43.10-PM

    See the following screen and notice that the Department is now published:

    Screenshot-2019-05-06-at-2.43.44-PM

    In the next step lets see that the Department details is shared with IGC.

  7. Check the IGC for Term called Department with new asset

    Do the step 5 to see the details of Term Department and notice that it now contains the new asset that was published.

    Business Term

    Click on the Assighed Assets DEPARTMENT and see the newly available details:

    Term Detail PAge

Join The Discussion