Contents


Overview

Skill Level: Intermediate

IGC, Q Replication, DB2, SQL, SQL/XML, XML, REST API, cURL

Data Replication is often used to feed data into a DWH. Governance Catalog has evolved to be the governance foundation in analytics architectures. This recipe combines the two and explains how to extend IGC data lineage with Q Replication subscriptions.

Ingredients

This recipe introduces a parser to automatically transform Q Replication metadata (stored in relational tables at the Q Capture and Q Apply database) into the Open IGC XML format which can be used to define custom assets and flows in Information Governance Catalog. This list of ingerdients has everything needed:

InfoSphere Information Governance Catalog

InfoSphere Information Governance Catalog (IGC) is a module of IBM's InfoSphere Information Server suite. It provides comprehensive capabilities for the management of business, technical, and operational metadata. This includes an authoring workflow for business metadata, import and management of technical metadata (e.g., for DBMS, BI, ETL, and modelling tools), and a framework to define own metadata categories. One of the key features is the automatic analysis and presentation of data lineage graphs and lineage reports so that the user can easily understand or report the relationships between metadata objects.

InfoSphere Data Replication - Q Replication

InfoSphere Data Replication is a data replication platform to replicate information between cross-platform heterogeneous data stores in near real time. Q Replication is one data replication discipline within this platform. Q Replication includes log based change capture for DB2 z/OS, DB2 LUW, and Oracle databases. Common usage scenarios are the delta provisioning of data warehouse and data lake systems including historization, the real-time maintenance of read-mirrors, zero-downtime data migrations, data synchronization between multiple active databases (continuous availability, active/actice), and others. All Q Replication specific metadata is stored in database tables at the replication source and replication target databases.

Data Lineage

Data Lineage provides a visual representation of data flows and movements from source to destination via various changes and hops on the way through the enterprise environment. Information Governance Catalog automatically analyzes metadata in the catalog to detect and represent dependencies between metadata objects. IGC can interactively display lineage graphs for metadata objects in the catalog, or can store lineage relationships in reports. Usually, IGC end-to-end lineage graphs span from data sources to reports and KPIs.

Open IGC

Open IGC is a framework to extend the standard IGC metadata categories (databases, tables, files, ETL processes, XML schemas, BI models, data models, and many more) with own categories. The API includes the definition of own 'asset bundles', the upload of 'bundle assets', and the upload of 'flows' to link own assets with standard or other own assets in data lineage. In Open IGC custom asset and flow metadata is delivered as XML objects. In this recipe we are going to use the Open IGC framework to load Q Replication specific metadata (subscription metadata) into the catalog.

IGC REST API

IGC REST API allows client applications to access and create content in IGC. IGC REST API enforces security roles, workflow, and viewing permissions just like the IBM InfoSphere Information Governance Catalog user interface. To test REST API calls IGC provides a REST Exporer interface: https://<igc_server>:<port>/ibm/iis/igc-rest-explorer. In this recipe we are going to use the IGC REST API to load generated XML asset and flow metadata.

SQL/XML

SQL/XML is part of the Structured Query Language (SQL) specification. It introduces the XML data type as well as publishing functions which can be used for document composition and publishing. In this recipe we are going to use SQL/XML to transform the Q Replication metadata which is stored within the Q Replication source and target database to Open IGC XML objects.

Database Federation

To generate Open IGC XML objects for Q Replication the Q Replication metadata at the source database has to be joined with the Q Replication metadata at the target database. As DB2 (LUW) includes database federation capabilities, this technique is used to federate source metadata into the target database (or vice versa or to federate source and target metadata into a 3rd database).

cURL

curl is an open source command line tool and library for transferring data with URL syntax. curl supports SSL certificates, HTTP GET, HTTP POST, HTTP PUT, and much more. In this recipe curl is used to call the IGC REST API from scripts.

All together

Puh, needs quite a bit of ingredients, but they are only taken from the "DBA" shelf and the "Governance" shelf - no programming or custom programs required. All is open, easy adoptable, easy to maintain.

Step-by-step

  1. Verification of the environment

    This recipe explains the integration of Q Replication and Information Governance metadata. It is expected that both IBM InfoSphere Data Replication and IBM InfoSphere Information Governance Catalog are installed and configured. Check for adjacent tutorials otherwise.

    Verification of the Q Replication Setup

    The metadata integration is explained along a sample use case. The use case assumes that Q Replication has been set up between two DB2 databases. Database “SOURCEDB” is the Q Capture server and database “SAMPLE” is the Q Apply server. Five examplary Q Subscriptions have been defined and are replicating successfully. Different queue maps and target table types have been used.

    db2rc_new_demo_de

    Verification of the Information Governance Catalog Setup

    Information Governance Catalog has also been set up. Technical metadata has been imported from various databases – among those the databases used as replication source and target. In the following pricture we see details of table “ADRESSEN” in database “SAMPLE” (at server “CHRISISVM1”) which is the target table of one of the above Q Subscriptions.

    igc_Prep_

    In the data lineage graph we can easily identify the table “ADRESSEN”. The lineage graph visualizes that the table is part of a data flow. The table is read by the DataStage ETL job “Kunde_Saldo_Clustered” (which also reads two other tables). Looking at the lineage graph one would so far believe that the tables “ADRESSEN” and “PLZ_KPIS” are the leftmost tables of this flow, although we already know from the review of the replication setup that the tables are target tables of two Q Replication Subscriptions which bring the data in from database SOURCEDB.Lineage graph without Q Replication metadata

    The following steps explain how to extend the lineage graph (and the underlying metadata in the catalog) with Q Replication Subscriptions by using the Open IGC framework. This will give us the full understanding, documentation and reporting capabilities of the complete data flow.

  2. Definition of an Open IGC asset bundle for Q Replication metadata

    Information Governance Catalog Version 11.3 introduced a framework to extend the standard metadata categories (such as databases, tables, BI and data models, business terms, policies and rules, etc.) with custom categories. This framework is called “Open IGC”. Open IGC allows to specify metadata categories as detailed as required plus allows to link custom metadata objects with other metadata objects (custom or standard objects) in data lineage.

    The custom metadata bundle has to be defined in a documented XML schema and furthermore consists of a collection of icons. Open IGC supports nationalization to express the metadata attributes in the languages that IGC supports. IGC REST calls are available to upload a new bundle specification, bundle assets, and flows into the catalog.

    In this step a bundle for Q Replication is specified. The below figure shows that the standard IGC metadata categories have already been complemented by custom metadata categories. “Q Replication” is one of them. For Q Replication, the metadata objects Q Replication Topologies, Queue Maps, Subscriptions, Replication Tasks, and Subscription Columns have beed defined.

    QRepBundle_neu

    More on Open IGC can be found in the IBM knowledge center or here:

    https://dsrealtime.wordpress.com/2015/07/29/open-igc-is-here/

    https://developer.ibm.com/recipes/tutorials/creation-of-new-bundle-on-infosphere-information-governance-catalog/

    Next this recipe focusses on how Q Replication metadata can be parsed automatically out of the control structures of InfoSphere Data Replication Q Replication.

  3. Federation of replication source and target control tables

    Q Replication consists of two runtime processes, called Q Capture and Q Apply. The relationship between a replication source table and a replication target table is called Q Subscription. Metadata of Q Subscriptions is stored in relational tables within the Q Capture database and the Q Apply database. These tables are called Replication Control Tables.

    The XML generator used here to transform Q Replication metadata into IGC asset and flow XML needs to join Q Capture and Q Apply control tables which are usually distributed accross two database, namely the replication source and the replication target database. To be able to join the Q Capture and Q Apply control tables, Nicknames for either the Q Capture or Q Apply control tables (or both) have to be created.

    The Nicknames for Q Capture control tables can be created in the Q Apply database or vice versa.

    Exception: DB2 for z/OS does not support the concept of federation. If both the Q Capture and the Q Apply databases are on z/OS, it is required to create a 3rd DB2 LUW database to federate both the Q Capture and Q Apply control tables into that database and run the parser against this DB2 LUW database.

    Here's a sample to create Nicknames for the required Q Capture control tables of database “SOURCEDB” in the Q Apply database “SAMPLE”:

    connect to SAMPLE;
    
    -- [set up of Wrappers and Servers ...]
    -- create required Nicknames
    create nickname SOURCEDB_LSN.IBMQREP_CAPPARMS
    for SOURCEDB.LSN.IBMQREP_CAPPARMS;
    create nickname SOURCEDB_LSN.IBMQREP_SENDQUEUES
    for SOURCEDB.LSN.IBMQREP_SENDQUEUES;
    create nickname SOURCEDB_LSN.IBMQREP_SUBS
    for SOURCEDB.LSN.IBMQREP_SUBS;
    create nickname SOURCEDB_LSN.IBMQREP_SRC_COLS
    for SOURCEDB.LSN.IBMQREP_SRC_COLS;

    Scripts are available to federate both the Q Capture and Q Apply control tables:

    • federation_of_source_database.sql: Federation of the control tables of the Q Capture server
    • federation_of_target_database.sql: Federation of the control tables of the Q Apply server

    After setting the contained variables to the correct values the scripts can be invoked by standard DB2 means:

    db2 -tvf <scriptname>

    The next step of the recipe explains how the replication control tables and nicknames are joined and selected to transform the relational metadata into Open IGC XML format.

  4. Transforming Q Replication metadata into Open IGC XML with SQL/XML

    The XML generator used here to transform the relational Q Replication metadata into the Open IGC asset and flow format leverages SQL/XML functions of DB2. Two SQL statements are required (which are using the same base structure and join criterions defined as a common table expression):

    • SQL/XML script to create the Open IGC XML file for asset publishing
    • SQL/XML script to create the Open IGC XML file for flows

    The generator uses the following techniques to achieve the result:

    1. Federation: as explained in the previous step, the XML generator needs to join Q Capture and Q Apply control tables. If the Q Capture server and the Q Apply server are not in the same database, DB2's federation capabilities are used to create Nicknames for required control tables of either or both control server locations
    2. ID generation: The Open IGC XML notation defines metadata hierarchies by referencing upper level objects in the <reference> element of a lower level object. The XML generator creates the IDs by using the DB2 dense_rank() function with appropriate partitioning and ordering
    3. SQL/XML: The XML generator uses various SQL/XML functions like XMLELEMENT, XMLAGG, XMLCONCAT to convert the relational content of the Q Replication control tables to the XML format required by Open IGC
    4. UNION: The distinct objects and sections of the Open IGC XML structures (like <assets>, <flows>, <importAction>, etc.) are selected one after the other. The distinct sections are concatenated to the final XML file using UNIONs. Each section selects the XML chunk (data type: CLOB) and a unique ascending sequence number (integer).
    5. ORDER BY: To sort the XML chunks correctly, an “ORDER BY sequence_number” phrase guarantees that all sections are concatenated in the correct sequence.

    Parsing by “Topology”

    The generator parses Q Replication metadata by “topology”. A topology is determined by the Q Capture server, the Q Capture schema, the Q Apply server, and the Q Apply schema. Example: Server: DBA / Schema: ASN to Server: DBB / Schema: ASN.
    If you are replicating in multiple “topologies”, run the XML generator multiple times to store all your topologies in Information Governance Catalog. In case of multi-directional Q Replication, run the XML generator twice, first for Server A / Schema A to Server B / Schema B and second for B/B to A/A.

    The subsequent steps of this recipe are going to show how the Q Replication topology “SOURCEDB/LSN to SAMPLE/LSN” was parsed and loaded.

    Testing the XML generator 'relationally'

    In addition to the two SQL scripts of the XML generator a third script is available to test the generator logic or to preview the result of the control table parsing as relational result set. The name of the tester is select_qsubs_relational.sql.

    The script selects all details relevant for the generation of the Open IGC XML content for the specified topology. All replication objects including source and target tables / columns are represented by name and by the generated unique key. Filter criterions can be optionally defined to parse only a subset or the available metadata (e.g., just all subscriptions for a queue map or only a subset of subscriptions).

    The below output only lists exemplary columns of the result set for a subset of the available metadata:

    TOPID TOPOLOGY_NAME                 PUBQMAPNAME                MAPID     CAPTURE_SERVER CAPTURE_SCHEMA APPLY_SERVER APPLY_SCHEMA SUBNAME       SUBID      SOURCE_OWNER SOURCE_NAME TARGET_OWNER TARGET_NAME SRC_COLNAME              CCOLID        TARGET_COLNAME           ACOLID       
    ----- ----------------------------- -------------------------- --------- -------------- -------------- ------------ ------------ ------------- ---------- ------------ ----------- ------------ ----------- ------------------------ ------------- ------------------------ -------------
    Top_1 SOURCEDB (LSN), SAMPLE (LSN)  SOURCEDB_LSN_TO_SAMPLE_LSN Map_1_1   SOURCEDB       LSN            SAMPLE       LSN          ADRESSEN0001  Sub_1_1_1  GFK          ADRESSEN    GFK          ADRESSEN    ADRNR                    CCol_1_1_1_1  ADRNR                    ACol_1_1_1_1 
    Top_1 SOURCEDB (LSN), SAMPLE (LSN)  SOURCEDB_LSN_TO_SAMPLE_LSN Map_1_1   SOURCEDB       LSN            SAMPLE       LSN          ADRESSEN0001  Sub_1_1_1  GFK          ADRESSEN    GFK          ADRESSEN    KDNR                     CCol_1_1_1_2  KDNR                     ACol_1_1_1_2 
    Top_1 SOURCEDB (LSN), SAMPLE (LSN)  SOURCEDB_LSN_TO_SAMPLE_LSN Map_1_1   SOURCEDB       LSN            SAMPLE       LSN          ADRESSEN0001  Sub_1_1_1  GFK          ADRESSEN    GFK          ADRESSEN    ADRESSTYP                CCol_1_1_1_3  ADRESSTYP                ACol_1_1_1_3 
    Top_1 SOURCEDB (LSN), SAMPLE (LSN)  SOURCEDB_LSN_TO_SAMPLE_LSN Map_1_1   SOURCEDB       LSN            SAMPLE       LSN          ADRESSEN0001  Sub_1_1_1  GFK          ADRESSEN    GFK          ADRESSEN    PLZ                      CCol_1_1_1_4  PLZ                      ACol_1_1_1_4 
    Top_1 SOURCEDB (LSN), SAMPLE (LSN)  SOURCEDB_LSN_TO_SAMPLE_LSN Map_1_1   SOURCEDB       LSN            SAMPLE       LSN          ADRESSEN0001  Sub_1_1_1  GFK          ADRESSEN    GFK          ADRESSEN    STRASSE                  CCol_1_1_1_5  STRASSE                  ACol_1_1_1_5 
    Top_1 SOURCEDB (LSN), SAMPLE (LSN)  SOURCEDB_LSN_TO_SAMPLE_LSN Map_1_1   SOURCEDB       LSN            SAMPLE       LSN          ADRESSEN0001  Sub_1_1_1  GFK          ADRESSEN    GFK          ADRESSEN    STADT                    CCol_1_1_1_6  STADT                    ACol_1_1_1_6 
    Top_1 SOURCEDB (LSN), SAMPLE (LSN)  SOURCEDB_LSN_TO_SAMPLE_LSN Map_1_1   SOURCEDB       LSN            SAMPLE       LSN          PLZ_KPIS0001  Sub_1_1_4  GFK          PLZ_KPIS    GFK          PLZ_KPIS    PLZ                      CCol_1_1_4_1  PLZ                      ACol_1_1_4_1 
    Top_1 SOURCEDB (LSN), SAMPLE (LSN)  SOURCEDB_LSN_TO_SAMPLE_LSN Map_1_1   SOURCEDB       LSN            SAMPLE       LSN          PLZ_KPIS0001  Sub_1_1_4  GFK          PLZ_KPIS    GFK          PLZ_KPIS    ANZ_EINWOHNER            CCol_1_1_4_2  ANZ_EINWOHNER            ACol_1_1_4_2 
    Top_1 SOURCEDB (LSN), SAMPLE (LSN)  SOURCEDB_LSN_TO_SAMPLE_LSN Map_1_1   SOURCEDB       LSN            SAMPLE       LSN          PLZ_KPIS0001  Sub_1_1_4  GFK          PLZ_KPIS    GFK          PLZ_KPIS    VERSCHULDUNGS_INDIKATOR  CCol_1_1_4_3  VERSCHULDUNGS_INDIKATOR  ACol_1_1_4_3 
    Top_1 SOURCEDB (LSN), SAMPLE (LSN)  SOURCEDB_LSN_TO_SAMPLE_LSN Map_1_1   SOURCEDB       LSN            SAMPLE       LSN          PLZ_KPIS0001  Sub_1_1_4  GFK          PLZ_KPIS    GFK          PLZ_KPIS    IPHONES_PRO_EINWOHNER    CCol_1_1_4_4  IPHONES_PRO_EINWOHNER    ACol_1_1_4_4 
    Top_1 SOURCEDB (LSN), SAMPLE (LSN)  SOURCEDB_LSN_TO_SAMPLE_LSN Map_1_1   SOURCEDB       LSN            SAMPLE       LSN          PLZ_KPIS0001  Sub_1_1_4  GFK          PLZ_KPIS    GFK          PLZ_KPIS    ANZ_EINBRUECHE_PRO_JAHR  CCol_1_1_4_5  ANZ_EINBRUECHE_PRO_JAHR  ACol_1_1_4_5 
    

    How the asset IDs are generated

    As mentioned before the Open IGC XML notation defines metadata hierarchies by referencing upper level objects in the <reference> element of a lower level object. The above result set shows that the generator for better traceability builds up the IDs by concatenating the ID of the upper level asset.

    Example: Sub_1_1_4 is the fourth subscription in the first replication map in topology one.

    The following code snippet shows how the unique Subsciption ID (unique per replication map) is selected:

    SELECT
    ...
    dense_rank() over(PARTITION BY MAPS.RECVQ ORDER BY TRG.SUBNAME)
    as SUB_ID,
    ...

    The next code snippet shows how the unique Subsciption ID is concatenated with the IDs of the upper level assets in the asset hierarchie: 

    SELECT 
    ...
    'Sub_' concat trim(char(core.TOP_ID))
    concat '_' concat trim(char(core.MAP_ID))
    concat '_' concat trim(char(core.SUB_ID)) as SUBID,
    ...

    How SQL/XML is used: FlowUnits

    The XML Generator SQL takes the relational result set of the joined Q Capture and Q Apply control tables  as input (see above) and converts it to XML. The following code snippet shows the most complex operation to output all <subFlows> grouped by subscription ID:

    ...

    SELECT
    16 as SEQCOL,
    XMLSERIALIZE(
    XMLELEMENT (name "flowUnit", XMLATTRIBUTES(qrep.SUBID as "assetID"),
    XMLCONCAT (
    XMLCONCAT (
    XMLELEMENT (name "subFlows", XMLATTRIBUTES('DESIGN' as "flowType", 'CAPTURE' as "comment"),
    XMLAGG ( XMLELEMENT ( name "flow", XMLATTRIBUTES(qrep.SOURCECOLID as "sourceIDs", qrep.CCOLID as "targetIDs"))
    ORDER BY qrep.SOURCECOLID) )
    ,
    XMLELEMENT (name "subFlows", XMLATTRIBUTES('SYSTEM' as "flowType", 'Replication Processing' as "comment"),
    XMLAGG ( XMLELEMENT ( name "flow", XMLATTRIBUTES(qrep.CCOLID as "sourceIDs", qrep.ACOLID_NULLABLE as "targetIDs"))
    ORDER BY qrep.ACOLID) )
    )
    ,
    XMLELEMENT (name "subFlows", XMLATTRIBUTES('DESIGN' as "flowType", 'APPLY' as "comment"),
    XMLAGG ( XMLELEMENT ( name "flow", XMLATTRIBUTES(qrep.ACOLID as "sourceIDs", qrep.TARGETCOLID as "targetIDs"))
    ORDER BY qrep.TARGETCOLID) )
    ) ) AS CLOB(1M) ) as MYXML from subdetails qrep
    group by subid

    UNION

    ...

    The XML result of the above code snippet for the sample scenario looks like this:

    flowUnitsXML

    The two required SQL statements to generate both IGC assets and flows are available as scripts:

    • select_QRep_xmlForAssetPublishing.sql: generating XML for IGC bundle assets
    • select_QRep_xmlForFlows.sql: to generating XML for IGC flows (lineage)

    After setting the contained variables to the correct values the scripts can be invoked by standard DB2 means:

    db2 -txf <scriptname>  >  <xmlfilename>

    The option -x means that no column headers but only the selected column values (a XML CLOB in this case) are written to the output. The output is piped to a file.

    Note:

    The generated files can become big because of the CLOB output data type and the concatenation of multiple CLOBs due to the chosen UNION approach. The following command can be used to shrink the files by eliminating consequtive blanks:

    sed -i -e 's/ //g'  <xmlfilename>
  5. Loading the generated Open IGC XML assets and flows into the catalog using the IGC REST API

    After having transformed the Q Replication metadata into Open IGC XML for assets and flows in the previous step the generated XML files can be loaded into IGC. This is done using the IGC REST API.

    The Q Replication XML generator documented in this recipe calls the IGC REST API from a script leveraging cURL. With cURL REST APIs can be called by command (and therefore can be easily used in scripts). Refer to this developerWorks recipe to learn more about interacting with governance metadata in IGC using REST APIs with cURL: https://developer.ibm.com/recipes/tutorials/interact-with-your-governance-metadata-in-igc-using-rest-apis-with-curl/

    Alternatively the generated XML can be uploaded using the IGC REST Explorer. This option is recommended to check detailed error messages if the cURL command returns an unsuccessful return code.

    Uploading the generated XML with cURL

    The following two cURL commands show how the Q Replication XML generator leverages the IGC REST API for uploading assets and flows. Note that the API payload (XML in this case) is read from a file:

    1) Command to upload the bundle assets parsed from a Q Replication topology:

    curl -v --cacert certdata -H "Content-Type:application/xml" 
    -H "Accept-Encoding:identity" --data "@select_QRep_xmlForAssetPublishing.xml" 
    -u igcuid:pwd  https://<igc_server>:<port>/ibm/iis/igc-rest/v1/bundles/assets

    Output with return code 200 (success):

    ...
    * We are completely uploaded and fine
    < HTTP/1.1 200 OK
    ...
    

    2) Command to upload the flows parsed from a Q Replication topology: 

    curl -v --cacert certdata -H "Content-Type:application/xml" 
    -H "Accept-Encoding:identity" --data "@select_QRep_xmlForFlows.xml" 
    -u igcuid:pwd https://<igc_server>:<port>/ibm/iis/igc-rest/v1/flows/upload

    Output with return code 200 (success):

    ...
    * We are completely uploaded and fine
    < HTTP/1.1 200 OK
    ...
    

    Uploading the generated XML with the IGC REST Explorer

    The IGC Rest Explorer (https://<igc_server>:<port>/ibm/iis/igc-rest-explorer) offers an UI to call the IGC REST API. For uploading Open IGC assets and flows the XML payload is simply pasted in as shown in the following figure. The result is the same compared to calling the APIs with cURL as above (or with any other application).

    restExplorer

    Output with return code 200 (success):

    Response200

  6. Verification of the results

    In step four the Q Replication metadata was transformed into Open IGC XML. In step five the XML was uploaded using IGC REST API. Now it's time to look at the results.

    New Q Replication hierarchie

    First we take a look at the custom asset type “Q Replication”. Refer to step two to see how the Information Governance Catalog was expanded with this asset type. The asset type “Q Replication” contains all hierachies (“topologies”) which have been uploaded previously. The exemplary topology Q Capture database “SOURCEDB”, Q Capture schema “LSN” to Q Apply database “SAMPLE”, Q Apply schema “LSN” is just one of them.

    Subscription details

    Expanding the hierarchie for the exemplary topology we see that it contains two Q Replication Maps. The first Map contains four Q Subscriptions, the second map just one. Highlighting a node in the hierarchie displays the details of that node. The below figure displays the details of Q Subscription “SALES0002”. It is a unidirectional subscription with target type “CCD”. All node and node details were generated and loaded using the SQL/XML script to create the Open IGC XML file for asset publishing.

    SalesSubscription

    Next lets see what the SQL/XML script to create the Open IGC XML file for flows got out of the Q Replication metadata.

    Usage Information

    Although the XML file to load the flows contained subflows on column level, IGC aggregates the lineage information up along the hierarchie automatically. The usage information in the Q Subscription details for “ADRESSEN0001” displays the metadata details of the tables the subscriptions reads from or writes to.

    AdressenUsage

    Data Lineage graph

    With that usage information available, Information Governance Catalog can now display the data flow including the Q Subscriptions. Compare the below IGC lineage graph with the lineage graph displayed in step one (Verification of the environment) to see that the objects with icon “star” (Q Subscriptions) are now part of the graph.

    lineage_with_Qsubs

    The lineage graph including Open IGC assets (like Q Replication subscriptions in this case) has the same capabilities compared to a standard lineage graph:

    • Lineage can be displayed on column level
    • Subscription objects can be expanded to see the mapping between Q Capture and Q Apply tasks (interesting in case columns were renamed within the Q Subscription)

    Lineage details

    The final figure shows the expanded details of a Q Subscription within the lineage graph.

    lineageDetails

  7. Wrapping it up

    IBM Q Replication is a data replication technology to replicate information between databases in near real time. Among the many Q Replication uses cases is data provisioning for analytics and data warehouse systems.

    Information Governance Catalog provides comprehensive capabilities for the management of business, technical, and operational metadata, including data lineage visualization and reporting. With the Open IGC framework, IGC's standard lineage focussing on DataStage ETL processing can be extended to include any relevant process and metadata.

    This recipe explained the techniques to leverage the Open IGC framework to load Q Replication metadata into the IGC repository. First a new metadata category was defined for Q Replication. Thereafter SQL (especially SQL/XML) was used to transform the relational Q Replication metadata into the XML format required by Open IGC. IGC REST API was used to load the generated XML. This parsing approach is easy to automate and allows to load big replication scenaios into IGC without manual interaction.

    All together the explained techniques extend the standard IGC data lineage with Q Replication subscriptions to be able to fully understand, document, and report end-to-end data integration and provisioning.

    If you want to achieve the same with SQL Replication instead of Q Replication, look at the recipe

    Good old DataPropagator goes BigData with Information Governance
    https://developer.ibm.com/recipes/tutorials/good-old-datapropagator-goes-modern-with-information-governance/

    If you are interested in more Q Replication related recipes, have a look here:

    Continuous high performance DB2 Q Replication plus repeatable query results (stable targets)
    https://developer.ibm.com/recipes/tutorials/continuous-high-performance-db2-q-replication-plus-repeatable-query-results-stable-targets/

    IBM Data Replication: A Q Subscription Generator
    https://developer.ibm.com/recipes/tutorials/ibm-data-replication-a-q-subscription-generator/

Join The Discussion