Machine learning using synthesized patient health records


This code pattern shows you how to train a machine learning model to predict type 2 diabetes using synthesized patient health records. Using synthesized data allows you to learn about building a model without having to worry about privacy issues associated with the use of real patient health records.


This project is part of a series of code patterns that focus on a fictional health care company, which stores electronic health records in a database on a z/OS server. Before running the notebook, the synthesized health records must be created and loaded into this database. The code pattern Transform and load big data CSV files into a database provides the steps for doing this. The records are created using the Synthea tool, transformed, and then loaded into the database.

In this code pattern, you will use a Jupyter Notebook on IBM Watson Studio to build a predictive model that demonstrates a potential healthcare use case. Jupyter Notebooks is a tool used by many data scientists to clean, transform, and visualize data, and build and test machine learning models. Although this is for demonstrative purposes only, you’ll see how to use Watson Machine Learning on a data set comprised of synthesized health care metrics to create a predictive model for risk of diabetes. After creating this model, inputs that are entered can be scored to form a prediction for an individual case. (Note that this application is used for demonstrative and illustrative purposes only and does not constitute an offering that has gone through regulatory review.)

When you have completed this code pattern, you will understand how to:

  • Prepare data using Apache Spark
  • Visualize data relationships using Pixiedust
  • Train a machine learning model and publish it in the Watson Machine Learning repository
  • Deploy the model as a web service and use it to make predictions



  1. Log in to IBM Watson Studio.
  2. Load the provided notebook into Watson Studio.
  3. Load the data into the notebook.
  4. Transform the data with Apache Spark.
  5. Create charts with PixieDust.
  6. Publish and deploy the model with Watson Machine Learning.


Find the detailed instructions in the README. These steps show you how to:

  1. Sign up for IBM Watson Studio.
  2. Create a project.
  3. Create a Watson Machine Learning instance.
  4. Add the notebook to your project.
  5. Run the notebook.