BigQuery is Google’s fully managed, petabyte scale, low cost analytics data warehouse that enables users with super-fast SQL queries using the processing power of Google’s infrastructure.
Information Server provides a native BigQuery Connector to read/write data from/to the tables on BigQuery and integrate it into the ETL job design.
We demonstrate a sample use case here which performs a read operation on BigQuery table using BigQuery Connector. The datastage job includes a BigQuery Connector as source stage and DB2 Connector as target, where the data from BigQuery is written to a table on DB2, moving data from cloud on to OnPremise in a hybrid cloud scenario.
In this recipe, I will show you how we can configure BigQuery Connector properties to read data from Google BigQuery and move it to DB2 table.
Configure BigQuery Connector Properties to read from Google BigQuery
1. Download the Google service account credentials json file and copy it to any location on Engine tier.
2. Provide the fully qualified path to the above json file under Credentials file in the Connection Properties as follows:
3. Provide the Schema name property on which the table resides.
4. Provide the Table name property from which data has to be read.
5. Optionally, you can choose to provide a value for row limit property, where only those number of rows will be read by each node. For example, if the row limit has been set to 10 and a two node configuration is used to run a job, then only 20 rows will be read from BigQuery table.
6. Under Output tab, provide the column name and type details of data, that needs to be read from Google BigQuery
Configure DB2 connector as target
1. Provide Database, Username and Password details for DB2, in the connection properties of DB2 Connector.
2. Select Write Mode as Insert and Generate SQL option to Yes, to auto-generate the insert statement.
3. Choose Table Action as Create and provide the DB2 Table name, where the data has to written.
Compile and run the job. The data from BigQuery table is written to the DB2 table.
When Datastage is configured to run on multiple nodes, each node reads a chunk of data in parallel from the BigQuery table.