Overview

Skill Level: Any Skill Level

This recipe helps you create, configure, compile, and execute DataStage Cloud Object Storage Connector jobs that can read the data from files on IBM Cloud Object Storage.

Ingredients

1. IBM Infosphere Information Server Datastage 11.7fp2 and above

2. IBM Cloud Object Storage account

Step-by-step

  1. Description

    IBM Cloud Object Storage is an IBM-managed cloud service that is a highly scalable cloud storage service, designed for high durability, resiliency and security.

    Information Server provides a native Cloud Object Storage Connector to read data from the files on IBM Cloud Object Storage and integrate it into the ETL job design.

    We demonstrate a sample use case here which performs a read operation on IBM Cloud Object Storage using Cloud Object Storage Connector. The datastage job includes a Cloud Object Storage Connector as source stage and a DB2 Connector as target, where the file data from IBM Cloud Object Storage is written to a table located on DB2.

    COS1-1

    In this recipe, I will show you how we can configure IBM Cloud Object Storage Connector properties to read data from IBM Cloud Object Storage.

  2. Configure Cloud Object Storage Connection properties

    1. Provide Login URL, Access Key and Secret Key from the IBM Cloud Object Storage account, in the Connection Properties as follows:

    cos2

    2. Alternatively, Resource Instance ID, API Key, Region and IAM URL can also be provided to connect to IBM Cloud Object Storage, by selecting Use Resource Instance ID as Yes.

  3. Configure Cloud Object Storage Connector Properties to read multiple files from IBM Cloud Object Storage

    1. Select the Read Mode as “Read Multiple files using Wildcards” and provide the Bucket name from which the files have to be read.

    2. Provide the File Name property with wildcards from which data has to be read from IBM Cloud Object Storage. Wildcards supported include * and ? . If there is any need for more filters in the file name, “Read multiple files using Regex Expression” can be used.

    3. Incase of using multiple files read option, all the files matching the wildcard/Regex should have the same schema.

    3. Choose the File format as CSV. Six file formats are supported currently: Delimited, CSV, Parquet, Avro, JSON, Excel. Any file format can be selected as per the requirement.

    4. Once the file format is selected, optional formatting properties such as header, delimiters, quotation mark etc can be provided as per the usage requirement.

    COS3

    7. Under Output tab, provide the column name and type details of data, that needs to be read from IBM Cloud Object Storage as follows:

    COS5-1

    8. Provide the table name and the connection details of DB2 in the DB2 Connector stage.

    9. Compile and run the job. The data from files located on IBM Cloud Object Storage is written to a table on DB2.

      COS6-1

  4. References

    https://www.ibm.com/support/knowledgecenter/SSZJPZ_11.7.0/com.ibm.swg.im.iis.conn.cloudobject.usage.doc/topics/connect_to_cld.html

    https://www.ibm.com/support/knowledgecenter/SSZJPZ_11.7.0/com.ibm.swg.im.iis.conn.cloudobject.usage.doc/topics/specifying_read_mode_cld.html

     

Join The Discussion