Overview

Skill Level: Beginner

It is required to know how to create Bluemix web applications, connecting web services to them, and basic database administration.

A step-by-step tutorial to synchronize data between an on-premise and an on-demand database using Talend Data Integration and an IBM Bluemix application.

This tutorial is brought to you by

-Nora Castillo
-Jerson Rivas
-David Heredia

Ingredients

- An IBM Bluemix application

- A cloud database service

- A Database Management System (and a database) stored on your computer

Step-by-step

  1. Create a Bluemix application

    First, create a PHP application using IBM Bluemix.

    A database service such as ClearDB must be connected to the application.

    bm1

    In order execute the web application properly, do not forget to create a file called “options.json”, and include it in a folder named “.bp-config”.

    The options.json file must include the following code:

    {
    "PHP_EXTENSIONS" : ["mysqli"]}
  2. Create a Talend Data Integration project

    Create a new Talend Data Integration Project:

    Click on “create new project”, and then, “Create”

    introtalend1

    Then, Tick “Select an existing project” to select the previously created project and after that, click “Finish”.

    introtalend2

     

  3. Create a connection for each database

    After the project is loaded, go to the Repository Panel and expand the Metadata menu. Then, right click on “DB Connections” and select “Create Connection”.

     metadata

    It is important to create an on-premise database to store the data that would be uploaded to the on-demand database when performing an update.

    For this example, the on-premise database was called “Provider”, and the on-demand database was named “Datawarehouse”.

    Write the on-premise database name and then click “Next”:

     name

    Then, insert the database Type and the database credentials

    For this example, the database management system that was used to create the “Provider” database was MySQL. So, that will be the DB Type.

    Then, click “Check” to confirm if the connection was successful.

    Imagen1

     

    If so, click OK, and then, Finish.

    After creating a connection for the created for the on-premise database called “Provider”, We should proceed to create another for the on-demand database (Datawarehouse) in a very similar way to the first connection:

    name2

    Then, write the database credentials and click “Finish”.

    successful2

  4. Extract Schemas from the two databases

    First, right click the “Datawarehouse” database connection and select “Extract Schema”.

    schema1

    A window will appear.

    schema2

    Click Next.

    Then, select the database and the tables you wish to transfer. In this case, all tables will be selected.

    schema3

    After that, click “Finish”.

    schema4

    Finally, extract the schema from the “Provider” database (right click on the database name displayed in the repository panel on the left, select “Extract Schema”, select the tables you wish to work with, click Next, and finally, click Finish)

  5. Synchronize!

    Go to the Repository Panel on the left, and select “Job Design”. Right click on it, then, select “Create Standard Job”.

    job1

    The following window will show up:

    job2

    Write the job name. In this case, it was called “etlFromProvider”. Then, click Finish.

    Go to Provider (Database) in the Repository Panel. Expand “Table Schemas” and select the table you are going to work with.

    Drag the table to the main area, and the Components window will pop up. Select tMySqlInput from the Components Window, and click OK.

    job3

    It will look this way:

    job4

    After that, expand “Datawarehouse” (name of the on-demand database) from the repository menu, and find “Table Schemas”. Expand it and drag a table to the main area.

    job5

    Select tMysqlOutput, and click OK.

    The diagram will look like the following one:

    job6

    Go to the palette, expand it, and select tMap.

    job7

    Drag tMap to the main area.

    Right click the Providers table icon in the main area. Select Row, and then, Main. Drag the arrow to the tMap component.

    job8

    Right click tMap, select Row, and then, select New Output. Drag the arrow to the Datawarehouse table.

    job9

    job10

    Give any name to the output.

    This would be the result:

    job11

    Double click the tMap component

    Then, click on Automap, on the upper left corner.

    automap

    All table rows should be related. Click on APPLY and then, OK.

    Finally, execute the process by clicking RUN:

    run-1

    If the process ran successfully, the following message would appear:

    job14

    And the synchronized tables will be updated

Join The Discussion