Digital Developer Conference: Hybrid Cloud. On Sep 22 & 24, start your journey to OpenShift certification. Free registration

Build a predictive machine learning model quickly and easily with IBM SPSS Modeler

This tutorial is part of the Getting started with IBM Cloud Pak for Data learning path.

In this tutorial, we will use IBM Cloud Pak for Data to build a predictive machine learning model with (IBM SPSS Modeler and decide whether a telco customer will churn or not. IBM Cloud Pak® for Data is an interactive, collaborative, cloud-based environment that allows developers and data scientists to work collaboratively, gain insight from data and build machine learning models.

Learning objectives

After completing this tutorial, you will learn how to:

  • Upload data to IBM Cloud Pak for Data.
  • Create an SPSS® Modeler flow.
  • Use the SPSS tool to inspect data and glean insights.
  • Modify and prepare data for AI model creation using SPSS.
  • Train a machine learning model with SPSS and evaluate the results.

Prerequisites

Estimated time

Completing this tutorial should take about 30 minutes.

Steps

  1. Create a project and upload the data
  2. Create an SPSS Modeler Flow
  3. Import the data
  4. Inspect the data
  5. Data preparation
  6. Train the ML model
  7. Evaluate the results

Step 1. Create a project and upload the data

Create an IBM Cloud Pak for Data project

  1. Using a browser, log into your ICP4D instance and click the hamburger (☰) menu in the upper-left corner and click Projects. From the Projects page, click New Project +. Create project

  2. Select Analytics project and click Next. Create analytics project

  3. Select Create an empty project. Create empty project

  4. Give your project a name and an optional description, then click Create. Name your project

The data assets page opens, and this is where your project assets are stored and organized.

Upload the data

  1. Download the Telco-Customer-Churn.csv dataset.

  2. From the Assets tab of your project, click on the 01/00 icon. You can either drag and drop the file or Browse to choose and upload the Telco-Customer-Churn.csv file. Upload dataset

Step 2. Create an SPSS Modeler flow

  1. From the Project home page, click Add to Project + and choose Modeler flow. Add modeler flow

  2. Give the flow a meaningful name, such as Telco Customer Churn Flow, then click Create. Create flow

Step 3. Import the data

  1. In the left-hand pane, expand Import, then drag and drop a Data Asset node on the canvas. Double-click on the node that was dropped on the canvas and click Change data asset. Data asset

  2. On the Assets page, open the Data Assets tab, choose the Telco-Customer-Churn.csv file you previously uploaded and click OK. Import data

  3. Click Save. Finish importing data

Step 4. Inspect the data

  1. To gain insight into your data, open the Output tab and drag and drop the data audit node onto the canvas. Hover over the Data Asset node that was dragged and dropped on the canvas earlier, and it should show a blue circular icon on the side. Click on the icon and drag over to the Data Audit node. This will connect the two nodes. The Data Audit node will automatically be renamed 21 Fields. Data Audit

  2. Hover over the Aata Audit node and click on the three vertical dots to open the menu for the node. Alternatively, right-click on the Data Audit node and click Run. Once it is ready, the output can be viewed by opening the Outputs menu on the right. Double-click on the output (Data Audit of [21 fields]) to view statistics about the data. Data inspection Data inspection 2

  3. Click Return to flow to go back.

Step 5. Data preparation

  1. Expand the Field Operations tab and drag and drop the Type node onto the canvas. Connect the Data Asset node with the Type node, then double-click on the Type node to make the necessary configurations. Type

  2. Click on Read Values. Once the read operation completes, check that the measure and role for each field is correct. Change the role of churn from Input to Target, then click Save to close the tab. Data preparation

Step 6. Train the ML model

  1. Expand the Modeling tab, then drag and drop the Random Forest node onto the canvas. Connect the Type node to the Random Forest node. The Random Forest node will automatically be renamed Churn. Random Forest

  2. Right-click on the Random Forest node and click Run. When the execution is done, you will see a new golden nugget-like Churn node added to the canvas. Start training

  3. Right-click on the new Churn golden nugget node and choose Preview to inspect the output results. Preview Random Forest result

Step 7. Evaluate the results

  1. Expand the Outputs tab, then drag and drop an analysis node onto the canvas. Connect the Churn golden nugget node to the Analysis node. Right-click on the Analysis node and click Run. Analysis

  2. From the Outputs tab on the right, double-click on the analysis output (analysis of [Churn]) to gain insight into the accuracy of the results. Analysis output

  3. Click on Return to flow to go back.

  4. Expand the Graphs tab, then drag and drop the Evaluation node onto the canvas. Connect the Churn golden nugget node with the Evaluation node. The Evaluation node will automatically be renamed $R-Churn. Right-click on the node and click Run. Evaluation

  5. Double-click on the $R-Churn output (evaluation of [$R-Churn]: Gains) to visualize the graph. Click Return to flow to go back. Evaluation graph

Summary

This tutorial demonstrates a small example of creating a predictive machine learning model on IBM SPSS Modeler on IBM Cloud Pak for Data. It went over importing the data into the project and the modeler flow, and preparing the data for modeling, then over the steps of choosing an appropriate algorithm for the data and training a prediction model. The last step was about how to visualize and evaluate the results of the trained model.

This tutorial is part of the Getting started with IBM Cloud Pak for Data learning path. To continue the series and learn more about IBM Cloud Pak for Data, take a look at the next tutorial, Monitoring the model with Watson OpenScale.