Skill Level: Beginner

This recipe helps you create, configure, compile, and execute a DataStage job that can write data from DB2 on to Google BigQuery


1. IBM Infosphere Information Server Datastage 11.7.1 and above

2. DB2 database

3. Google BigQuery Account


  1. Description

    BigQuery is Google’s fully managed, petabyte scale, low cost analytics data warehouse that enables users with super-fast SQL queries using the processing power of Google’s infrastructure.

    Information Server provides a native BigQuery Connector to read/write data from/to the tables on BigQuery and integrate it into the ETL job design.

    We demonstrate a sample use case here which performs a write operation on BigQuery using BigQuery Connector. The datastage job includes a DB2 Connector as source stage and a BigQuery Connector  as target, where the data from DB2 is written to a table stored on Google BigQuery, moving data from OnPremise environment on to cloud.


    In this recipe, I will show you how we can configure BigQuery Connector properties to write data to Google BigQuery from DB2.

  2. Configure DB2 connector as source

    1. Provide Database, Username and Password details for DB2, in the connection properties of DB2 Connector as follows:



    2. Select Generate SQL option to auto-generate the select statement.

    3. Provide the DB2 Table name, where the data to be read is present.


  3. Configure BigQuery Connector Properties to write to Google BigQuery

    1. Download the Google service account credentials json file and copy it to any location on Engine tier.

    2. Provide the fully qualified path to the above json file under Credentials file in the Connection Properties as follows:


    3. Provide the Schema name property in which the table has to reside.

    4. Provide the Table name property to which data has to be written from DB2. If the table doesn’t already exist, it will be created during the job run and then the data will be written to the table. If the table already exists, it will be appended with the data from the job.


    5. Under Input tab, provide the column name and type details of data, that needs to be written to Google BigQuery as follows:


    6. Compile and run the job. The data from DB2 table is written to the BigQuery table.

    When Datastage is configured to run on multiple nodes, each node writes a chunk of data in parallel to the BigQuery table.


  4. References




2 comments on"IBM DataStage BigQuery Connector to write data to Google BigQuery"

  1. Sumanth117 June 02, 2020

    When we read data in parallel using BigQuery Connector it is duplicating data. For example let’s say I have id ‘123’ in my google Bigdata and when I use 2 node configuration then Datastage loading 2 times ‘123’. To avoid this I’m reading data using Sequential mode. But is there any setting that I need to change to read data in parallel?

  2. Alekhya Telekicherla June 03, 2020

    Hi Sumanth, I am not sure which IS version you are using. But as far as I remember, this was never an issue. The connector can support reading in parallel. Can you specify if it is regular BigQuery table or a partitioned one?

Join The Discussion