Overview

Skill Level: Any Skill Level

This recipe helps you create, configure, compile, and execute DataStage Azure Storage Connector jobs that can write the data as files on Azure File Storage and Azure Blob Storage.

Ingredients

1. IBM Infosphere Information Server Datastage 11.7fp1 and above

2. Azure Storage Account

Step-by-step

  1. Description

    Azure Storage is a Microsoft-managed cloud service that provides storage that is highly available, secure, durable, scalable. Azure Storage consists of three data services: Blob storage, File storage, and Queue storage.

    Information Server provides a native Azure Storage Connector to write data to the files on Azure File Storage and Azure Blob Storage from the ETL job design.

    We demonstrate a sample use case here which performs write operation on Azure Blob Storage using Azure Storage Connector Stage. The datastage job includes a DB2 Connector as source stage and a Azure Storage Connector stage as target, where the data from DB2 table is written as a file stored on Azure Blob Storage.

                  Picture1

     

    Write can be performed in two ways: Normal write and Parallel Write.

    In this recipe, I will show you how we can configure Azure Storage Connector properties for both Normal write and Parallel Write and run a simple job.

  2. Configure Azure Storage Connector Connection Properties

    1. Get the default endpoint protocol, storage account name and the access key for your Azure Storage account.
    2. Provide Http proxy server and port details incase there is any proxy connection.azure_conn1
    3. Alternatively, you can choose to provide path to the credentials file located on Engine tier, containing the Azure Storage Connection String.

              azure_conn2

  3. Configure Azure Storage Connector to write to Azure Blob Storage

    1. Select the Storage Type as “Blob” to write to Azure Blob Storage. Incase you want to write to Azure File Storage, choose the Storage Type as “File”.

    2. Select the Write Mode as “Write” and provide the Container name to which the file has to be written. If container doesn’t already exist in the Azure Blob Storage, it can be created during the job run by selecting Create Container option as “Yes”.

    3. Provide the File name property. If the file needs to be written to a folder, file name can be specified as <Directory>\<filename>.

    4. Choose the File format as Delimited. Two file formats are supported currently: Delimited and CSV.

    5. Once the file format is selected, optional formatting properties such as delimiters, quotation mark etc can be provided as per the usage requirement.

    Picture3

    6. Under Input tab, provide the column name and type details of data, that needs to be written to Azure Blob Storage as follows:

      Column_list

     

    7. Provide the table name and the connection details of DB2 in the DB2 Connector stage.

    8. Compile and run the job. The data from DB2 table is written to the file on the Azure Blob Storage.

    Run_azure_write

  4. Additional Configuration for Parallel Write

    The Azure Storage Connector stage can be configured to run on multiple nodes and connect to Azure Blob storage to write data in parallel. Parallel Write creates few temporary files during processing, which will be deleted at the end of the job. Also the order of the data is not ensured incase of Parallel write.

    1. In addition to all the steps above , select Enable Parallel Write as “Yes”.

    2. Optionally, you can also provide a separate container name for creating the temporary files required for processing. Mention the name of the container in Container for temporary files. In case you want to create the new container for temporary files processing, choose Create temporary container to “Yes”.

    4. If no temporary container is provided, the container provided in the Container property will be used by default.

    Azure_write_parallel

    5. Once these details are configured along with properties in the previous section, the job can be compiled and run in the same way, as shown above.

  5. Additional Resources

    https://www.ibm.com/support/knowledgecenter/SSZJPZ_11.7.0/com.ibm.swg.im.iis.conn.azure.usage.doc/topics/connect_to_azuree.html

    https://www.ibm.com/support/knowledgecenter/SSZJPZ_11.7.0/com.ibm.swg.im.iis.conn.azure.usage.doc/topics/writing_data_parent_azure.html

Join The Discussion