Overview

Skill Level: Any Skill Level

Beginner

A brief introductionIn the last version of InfoSphere Information Server (11.5) and in the last rollups of the previous one, the Open IGC functionality was introduced in order to create and describe in Information Governance Catalog custom assets and view a data lineage between them. This new functionality is useful if the user wants to […]

Ingredients

Basic knowledge of XML

Basic knowledge of Information Governance Catalog

Step-by-step

  1. A brief introduction

    In the last version of InfoSphere Information Server (11.5) and in the last rollups of the previous one, the Open IGC functionality was introduced in order to create and describe in Information Governance Catalog custom assets and view a data lineage between them. This new functionality is useful if the user wants to expand the data government to abstract concepts or relationships (such as Ownership configuration) or to proprietary applications and process such as 3rd party ETL tools: that is the case of this article.
    In a customer environment it is often necessary to integrate 3rd party ETL tool with the Data Management one and Information Governance Catalog offers 2 ways to do that: the creation of extension mappings that map sources to targets and the Open Catalog in which you can describe the complete flow of the data.
    The advantage of the first choice is the rapidity of mapping and the possibility to use a simple GUI or a csv, but if there is the necessity to follow the flow of the data in all its stages, the best choice is the second one, even if a little more complex.

  2. The 3 main steps

    The steps for the correct creation of an object of a certain type and its lineage are 3:

    • the definition of a bundle with the inner structure, the properties, the custom attributes and the icons of your object
    • the physical description of the assets that are of that type
    • the drawing of the lineage that practically means the drawing of the arrows that connect the pieces of the asset
  3. A short analysis of the case

    This article is about the creation of a possible bundle that represents the ODI ETL jobs.
    In the case I analyzed, one project contained both packages calling interfaces and interfaces alone and, additionally, sometimes interfaces called other interfaces inside.

  4. Step 1: Definition and import of the bundle

    In the description of the structure there was the necessity to associate 2 or 3 possible parents to Interface asset: another Interface, a Package or a Project. From the release 19 of Information Server 11.3.1.2 and from the Rollup 1.1 of IS 11.5 the possibility to assign explicitly or implicitly by inheritance more than one parent to an asset is available.

    I chose the implicit one and the concept of Super Class.

    I defined a virtual Superclass named Container: classes sons of this Super Class would be Project, Package and Interface. I referred to the Super Class Container as virtual because it never appears in navigation bar that shows assets and hasnt any other role except to be the Super Class to which refer the assets I wrote.

    <class localId="Container" canHaveImage="false" expandableInLineage="true">
    <label key="class.Container" inDefaultLocale="Container" />
    <pluralLabel key="class-plural.Container" inDefaultLocale="Containers" />
    </class>

    The class Project, being the first in my logic tree, was created only with reference to Super Class:

    <class localId="Project" canHaveImage="true" superClassRef="Container">
    <label key="class.Project" inDefaultLocale="ODI Project" />
    <pluralLabel key="class-plural.Project" inDefaultLocale="ODI Projects" />
    []</class>

    where in [] there is the custom definition of some attributes to show in header section.

    Packages and Interfaces, having necessarily a parent, were defined with both the reference to the parent Container and as class of Super Class Container:

    <class localId="Package" containerClassRefs="Container" canHaveImage="true" expandableInLineage="true" superClassRef="Container">
    <label key="class.Package" inDefaultLocale="ODI Package" />
    <pluralLabel key="class-plural.Package" inDefaultLocale="ODI Packages" />
    []</class>

    <class localId="Interface" containerClassRefs="Container" canHaveImage="true" expandableInLineage="true" superClassRef="Container">
    <label key="class.Interface" inDefaultLocale="ODI Interface" />
    <pluralLabel key="class-plural.Interface" inDefaultLocale="ODI Interfaces"/>
    []</class>

    where, again, [] is for a missing part with the definition of custom attributes (and will be used with this meaning in all the document).

    Referring directly to Interface as parent there is another Super Class: the one that represents stages, named Stage. The choice to define it as Super Class is due to the fact that there is a large number of stages of different type in an ETL tool, such as Table, Filter, Join, Lookup, Transformer and so on…

    <class localId="Stage" containerClassRefs="Interface" canHaveImage="false">
    <label key="class.Stage" inDefaultLocale="ODI Stage" />
    <pluralLabel key="class-plural.Stage" inDefaultLocale="ODI Stages" />
    []</class>

    <class localId="Stage_Transformer" superClassRef="Stage">
    <label key="class.Stage_Transformer" inDefaultLocale="ODI Transformer Stage" />
    <pluralLabel key="class-plural.Stage_Transformer" inDefaultLocale="ODI Transformer Stages" />
    </class>

    Of course, the choice of the type of stages to describe is related to customer demand, but the bundle is fully customizable for its inner nature so new ones can be added. Contrarily to Container one, this Super Class is thought to be present in search and navigation bar to collect in a unique container all types of stages, different in their behavior, but not for inner nature.

    As stage children we have the datafields, intended as the basic unit to work with; they represent the columns that flow from a stage to another.

    <class localId="DataField" containerClassRefs="Stage" canHaveImage="false" >
    <label key="class.DataField" inDefaultLocale="ODI Interface Field" />
    <pluralLabel key="class-plural.DataField" inDefaultLocale="ODI Interface Fields" />
    []</class>

    The bundle is completed with custom icons (that in the specific case are, for commodity, DataStage ones).

    There is also the possibility to write properties files for internationalization.
    Once the xml (asset_type_descriptor.xml) with the structure, the icons and, in case, the properties files are ready they can be imported via REST API together in a unique zip file. (Attention: the zip must have inside only the folder icons, the eventual folder i18n and the asset_type_descriptor.xml file; if they are inside another folder, the import wont be successful).

    Structure created in Information Governance Catalog after the import of the bundle

  5. Step 2: Physical description of the assets

    After importing of the bundle, assets using that structure can be created.

    I want to focus on the usage of Super Class Container.

    Suppose to describe a project with a package inside that have one or more Interfaces and with at least one of these Interfaces calling another Interface. The file should be constructed in this way: first of all there should be the tags for the definition of the project with its local ID, after the representation of the package with a tag <reference> with the field name filled with $Container and the field assetIDs with the project local ID; of course, the package itself will have an its own local ID, to which the Interfaces will refer, so in tag reference for Interfaces $Container should appear in field name and the reference to the package local ID in assetIDs, and so on for sub Interfaces that refer to the Interfaces.

    The example provided shows a Project, a principal Interface and 2 Sub Interfaces, without the presence of package class; the asset can be defined anyway, exactly because package is not a direct parent of Interface, but Interface has as parent a Container (so all Classes that are defined as type Container can be parent of Interface, in this case Project).

    In a case where Interface is direct child of Package, this type of structure would give error during import.

    <asset class="$GenODIInherit-Project" repr="Prova ODI" ID="a1">
    <attribute name="name" value="Prova ODI"/>
    <attribute name="$phase" value="DEV"/>
    </asset>

    <asset class="$GenODIInherit-Interface" repr="IF Principale" ID="a2">
    <attribute name="$author" value="Ale"/>
    <attribute name="name" value="IF Principale"/>
    <attribute name="short_description" value="ProvaInterfacce"/>
    <attribute name="long_description" value="Interfaccia Generale contiene 2 sotto interfacce e 2 stages"/>
    <reference name="$Container" assetIDs="a1"/>
    </asset>

    <asset class="$GenODIInherit-Interface" repr="IF Sub1" ID="a3">
    <attribute name="$author" value="Ale"/>
    <attribute name="name" value="IF Sub1"/>
    <attribute name="short_description" value="Prima sottointerfaccia"/>
    <attribute name="long_description" value="Prima sottointerfaccia"/>
    <reference name="$Container" assetIDs="a2"/>
    </asset>

    <asset class="$GenODIInherit-Interface" repr="IF Sub2" ID="a4">
    <attribute name="$author" value="Ale"/>
    <attribute name="name" value="IF Sub2"/>
    <attribute name="short_description" value="Seconda sottointerfaccia"/>
    <attribute name="long_description" value="Seconda sottointerfaccia"/>
    <reference name="$Container" assetIDs="a2"/>
    </asset>

    Here is an example of stage: the natural reference name is $Interface and the assetIDs field must be filled with local ID of the Interface to which the stages refer.

    <asset class="$GenODIInherit-Funnel" repr="Union All Tables" ID="a5">
    <attribute name="name" value="Union All Tables"/>
    <reference name="$Interface" assetIDs="a2"/>
    </asset>

    (Of course, the choice of local ID is free, I choose the crescent alphanumeric one only for convenience).

    DataFields refer generically to $Stage with assetIDs filled with the local ID of the correspondent stage.

    <asset class="$GenODIInherit-DataField" repr="COD_AGENZIA" ID="a17">
    <attribute name="name" value="COD_AGENZIA"/>
    <attribute name="$datatype" value="String Varchar (18)"/>
    <reference name="$Stage" assetIDs="a6"/>
    </asset>

    Here below a pair of images showing the result of the asset import on Catalog

  6. Step 3: Creation of lineage

    The creation of flow document is quite simple; the first part is a description of all assets already written in previous file (it is not necessary to specify again the part of the attributes) and the description of the physical asset already present in the Catalog such as database tables/columns and so on (for the information relative to these assets could be useful to consult the GET /flows/util/snippet/{id} API in igc-rest-explorer page).

    The second part is the one that establishes relationships between assets: inside the tag subFlows local IDs of the source and the target must be specified. Lineage can be conducted also between assets of different type: for example, when analyzed arrows were conducted from a sub Interface (type Interface) to a Union Stage (type Stage) to show the direct involvement of the sub Interfaces in principal one (a lineage column to column was conducted also from target of the sub Interfaces to column of the Union stage).

    Here is an example:

    <subFlows flowType="SYSTEM" comment="Dalla Sottointerfaccia 1 alla Union">
    <flow sourceIDs="a3" targetIDs="a5"/>
    </subFlows>

    In screenshots below we can see different level of lineage:

Join The Discussion