Overview

Skill Level: Any Skill Level

This recipe helps you create, configure, compile, and execute DataStage Azure Storage Connector jobs that can read the data from files on Azure File Storage and Azure Blob Storage.

Ingredients

1. IBM Infosphere Information Server Datastage 11.7fp1 and above

2. Azure Storage Account

Step-by-step

  1. Description

    Azure Storage is a Microsoft-managed cloud service that provides storage that is highly available, secure, durable, scalable. Azure Storage consists of three data services: Blob storage, File storage, and Queue storage.

    Information Server provides a native Azure Storage Connector to read data from the files on Azure File Storage and Azure Blob Storage and integrate it in the ETL job design.

    We demonstrate a sample use case here which performs read operation from Azure Blob Storage using Azure Storage Connector Stage. The datastage job includes an Azure Storage Connector stage as source and Sequential file as target, where the data from the file on Blob Storage is written to the Sequential file. Additionally, we have a reject link from Azure Storage Connector to another sequential file, which collects the rejected rows.

    Azure_read_1

     

    We will see below how we can configure Azure Storage Connector properties for Read and Reject.

  2. Configure Azure Storage Connector Connection Properties

    1.Get the default endpoint protocol, storage account name and the access key for your Azure Storage account.
    2. Provide Http proxy server and port details incase there is any proxy connection.

    azure_conn1-1

    3. Alternatively, you can choose to provide path to the credentials file located on Engine tier, containing the Azure Storage Connection String.

    azure_conn2-1

  3. Configure Azure Storage Connector to read data from Azure Blob Storage and capture the rejected rows

    1. Select the Storage Type as “Blob” to read a file from Azure Blob Storage. Incase you want to read from Azure File Storage, choose the Storage Type as “File”.

    2. Select the Read Mode as “Read Single File” and provide the Container name from which the file has to be read.

    3. Provide the File name property. If the file needs to be read from a folder, file name can be specified as <Directory>\<filename>.

    4. Select Reject Mode as “Reject”. This should be selected to reject inorder to capture rejected rows.

    5. Choose the File format as Delimited. Two file formats are supported currently: Delimited and CSV.

    6. Once the file format is selected, optional formatting properties such as delimiters, quotation mark etc can be provided as per the usage requirement.

    Azure_Read_2

     

    6. Under Output ->Columns tab (link where the File data is processed), provide the column name and type details of data, that needs to be read from Azure Blob Storage as follows:

    Azure_Read_3

     

    7. Provide file name details in the Data Sequential file.

    8. Under Output->Columns tab (link where the rejected data is processed), provide a mandatory column with any binary datatype to hold the rejectedrows and provide an optional column of any string type to capture the rejected message as follows:

    Azure_read_5

    9. Under Output->Properties tab of the reject link, select “Is reject link” as Yes

      Azure_Read_4

    10. Provide file name details in the Reject Sequential file.

    11. Compile and run the job. The file located on Azure Blob Storage contains 100 rows of data, of which 97 rows are written to the Data Sequential file and 3 rows are captured in the rejected Sequential file.

    Azure_read_6

     

    12. The rejected sequential file data can be seen as follows. It shows that the reason for rejecting the rows with id values as  90,36,43 is that the values of the name column are null for these three rows and the name column is created as non-nullable in our job. So, these null value rows are rejected through the reject link and captured in the Sequential file.

    Azure_read_7

     

    13. In this way, we will be able to read the correct formatted data into the target file. Also, the rejected data can be corrected based upon the error message seen and re-inserted to Azure Blob Storage to avoid invalid entries.

  4. Additional Resources

    https://www.ibm.com/support/knowledgecenter/SSZJPZ_11.7.0/com.ibm.swg.im.iis.conn.azure.usage.doc/topics/reading_data_parent_azure.html

    https://www.ibm.com/support/knowledgecenter/SSZJPZ_11.7.0/com.ibm.swg.im.iis.conn.azure.usage.doc/topics/azure_rej_rec.html

     

Join The Discussion