Overview

Skill Level: Any Skill Level

This recipe helps you create, configure, compile, and execute a DataStage job that can write data from Amazon S3 on to IBM Cloud Object Storage.

Ingredients

1. IBM Infosphere Information Server Datastage 11.7fp2 and above

2. Amazon S3 account

3. IBM Cloud Object Storage account

Step-by-step

  1. Description

    IBM Cloud Object Storage is an IBM-managed cloud service that is a highly scalable cloud storage service, designed for high durability, resiliency and security.

    Information Server provides a native Cloud Object Storage Connector to read / write data from the files on IBM Cloud Object Storage and integrate it into the ETL job design.

    We demonstrate a sample use case here which performs a write operation on IBM Cloud Object Storage using Cloud Object Storage Connector. The datastage job includes an Amazon S3 Connector as source stage and a Cloud Object Storage Connector  as target, where the data from Amazon S3 is written to a file stored on IBM Cloud Object Storage, moving data across different cloud platforms.

     COS1

    In this recipe, I will show you how we can configure IBM Cloud Object Storage Connector properties to write data to IBM Cloud Object Storage from Amazon S3.

  2. Configure Amazon S3 connector as source

    1. Provide Access Key and Secret Key of the Amazon S3 account, in the connection properties of Amazon S3 Connector as follows:

    COS_2

    2. Select “Read Multiple Files” to read data from all the files starting with the filename prefix.

    3. Provide Bucket name and file name, where the data to be read is present.

    4. Choose the file format, Delimited in this case.

    cos_3

     

  3. Configure Cloud Object Storage Connector Properties to write to IBM Cloud Object Storage

    1. Provide Login URL, Access Key and Secret Key from the IBM Cloud Object Storage account, in the Connection Properties as follows:

    Cos4

    2. Alternatively, Resource Instance ID, API Key, Region and IAM URL can also be provided to connect to IBM Cloud Object Storage, by selecting Use Resource Instance ID as Yes.

    3. Select the Write Mode as “Write” and provide the Bucket name to which the file has to be written. If bucket doesn’t already exist in the IBM Cloud Object Storage, it can be created during the job run by selecting Create Bucket option as “Yes”.

    4. Provide the File Name property to which data has to be written from Amazon S3.

    5. Choose the File format as Delimited. Six file formats are supported currently: Delimited, CSV, Parquet, Avro, JSON, Excel. Any file format can be selected as per the requirement.

    6. Once the file format is selected, optional formatting properties such as delimiters, quotation mark etc can be provided as per the usage requirement.

    COS5

    7. Under Input tab, provide the column name and type details of data, that needs to be written to IBM Cloud Object Storage as follows:

    COS6

    8. Compile and run the job. The data from Amazon S3 files is written to the file on the IBM Cloud Object Storage.

    When Datastage is configured to run on multiple nodes, multiple files will created with node number appended to the filename as <filename>.0, <filename>.1.

    COS_7-1

     

  4. References

    https://www.ibm.com/support/knowledgecenter/SSZJPZ_11.5.0/com.ibm.swg.im.iis.conn.s3.usage.doc/topics/t_configuring_s3_read.html

    https://www.ibm.com/cloud/object-storage

    https://www.ibm.com/support/knowledgecenter/SSZJPZ_11.7.0/com.ibm.swg.im.iis.conn.cloudobject.usage.doc/topics/specifying_write_mode_cld.html

     

     

     

Join The Discussion