Transform and load big data CSV files into a database

Summary

This code pattern shows you how to generate a set of CSV files, transform them using a tool called SQLite, and load them to a Db2 for z/OS database using a JDBC function called zload.

Description

This work was done as part of the Example Health set of code patterns, which demonstrate how cloud technology can access data stored on z/OS systems. We needed a way to generate a large amount of patient health care data to populate the Db2 for z/OS database. We found an open source tool called Synthea that generates the kind of synthentic data we wanted.

The Synthea CSV files needed to be transformed to match the table schemas used in the Example Health application. We found a public domain tool called SQLite which made these transformations easy.

Finally, the transformed CSV files needed to be loaded from a distributed workstation into the Db2 for z/OS database. We used a JDBC function called zload to accomplish this. zload requires Db2 for z/OS version 12.

Flow

A shell script (run.sh or run.bat) drives the processing. There are four main steps as shown below.

flow

  1. The Synthea tool is called to generate a set of CSV files containing synthesized patient health care data.
  2. A JDBC program is called to determine the current maximum patient number in the DB2 for z/OS database.
  3. The SQLite program is called to transform the CSV files produced by Synthea to match the schema of the DB2 for z/OS database.
  4. A JDBC program is called to load the transformed CSV files into the DB2 for z/OS database tables.

Instructions

Find the detailed steps for this pattern in the README file. The steps show you how to:

  1. Install the required tools.
  2. Clone and build the project.
  3. Clone and build the Synthea project.
  4. Change the properties in synthea/src/main/resources/synthea.properties as necessary.
  5. Create the DB2 for z/OS database.
  6. Set up environment variables that the script requires to connect to your DB2 for z/OS database.
  7. Run the script from the project with the current directory set to the Synthea project.