IBM SPSS Modeler provides predictive analytics to help you uncover data patterns, gain predictive accuracy and improve decision making. This tutorial demonstrates an end-to-end flow of how to use SPSS Modeler on Watson Studio, ingest the data in a Db2 Warehouse database, perform analytics, and store back the results as a new table in the same database.

Learning objectives

This tutorial will show you how to:

  1. Add the DB2 Warehouse Connection in Watson Studio.
  2. Create a new SPSS Modeler stream or add an existing SPSS Modeler.
  3. Run the SPSS Modeler and store output on DB2 Warehouse.

Prerequisites

Estimated time

Completing this tutorial should take about 30 minutes.

Steps

Step 1: Load sample data in Db2 Warehouse

  • Open your IBM Cloud Dashboard and under Cloud Foundry Services open your created DB2 Warehouse instance.
  • Click on Open Console in the page opened, as shown below.

Open Db2

  • Once the service opens up, from the side menu, click on the load option.

Load table

  • Select the train.csv file from the unzipped data folder from Kaggle.

File selection

  • Once you load select your appropriate Schema, usually specified as DASH followed by a series of numbers. Click on the New Table option as shown.

New table

  • Name it TITANIC_DATA or any other name of your choice. And click on Begin Load.

Step 2: Add the DB2 Warehouse connection in Watson Studio

  • Open your Watson Studio from IBM Cloud Dashboard and navigate to the created project or create a new Modeler Project and make sure you link your Cloud Object Storage instance to the Project.

Modeler project

  • Click on the Add to Project button and select Connection.

Add to project connection

  • Select your DB2 Warehouse instance created on IBM Cloud.

Db2 Warehouse connection

  • The details would already be filled and click the Create button.

Create Db2 Warehouse connection

Step 3: Create the SPSS Modeler and insert data from Db2 Warehouse

  • Click on the Add to Project button and select Modeler.

Add to project modeler

  • Enter the a name for the modeler and make sure the below options are selected, and click on Create.

Create SPSS

  • On the Right side menu under the Import tab drag and drop the Data Assets node.

Data asset 1

  • Click on the Change Data Asset > Connections > DB2Warehouse and select the your schema or default schema (starting with DASH) and the uploaded table. Next, click on Save.

Data asset 2

Step 4: Visualize, analyze and perform feature selection with SPSS Modeler

  • Given a dataset, SPSS Modeler provides many visualisation tools to understand the data. In this section, the tutorial will show how to create these Visualisations and get insights from the data.

Visualize

  • In the options of the Data Asset node, click on the Preview option and go to Visualizations tab.

Visualization

  • Histograms are used to show the distribution of the data, in this example, plot a graph by taking Age in the x-axis and split-by sex.

Histogram

From this graph we can see that the ages of the passengers, follows a normal distribution, i.e, most people have an age range of 20-55, and there are fewer people who are less than 20 and greater than 50. We can also see that, for some age groups, there are more men than women.

  • A bar diagram makes it easy to compare sets of data between different groups at a glance. Here we see the comparison between the number of male survivors and the number of female survivors. Taking sex as the category, Summary value as count and the value option as Survived.

Bar image

  • A pie chart is best used when trying to work out the composition of something. In this case we give the category as Cabin.

Pie chart

From this pie chart, we can see that 80% of the Cabin column has NaNs that is missing values. So, we can make a conclusion to drop this column since we know it cannot affect the target, in our case the Survived column.

Feature selection

  • Before using Feature Selection, we need to prepare the dataset. First we use Filler to fill in all missing values with null and NaN values. Next, we filter out the Cabin column from the Visualization stage using Filter node. And, finally we assign the correct Type.

Data preparation

  • In the Type node adjust the Survived column’s Role property as Target.

Set target

  • Use the Feature Selection under the Modeling tab and run the modeler.

Feature selection

  • A model node will be generated, while connecting to a Table node in the Output tab, you can see it filters out the unimportant columns.

Note: Nodes such as Derive and Merge can be used to create new columns from existing columns and merge two dataframes.

Step 5: Save data back to Db2 Warehouse

  • Add the Data Asset Export node from Export tab.
  • Follow the steps from 1. Load Sample Data in Db2 warehouse to connect your Db2 Warehouse instance and select any table.
  • Before saving the changes for the node. Enter a name for the output table, make sure the name is unique to the table names within the schema.

Save data

  • Run the modeler flow and the desired output will be saved in your Db2 Warehouse.

Summary

In this tutorial, you learned how to:

  • Load data from DB2 Warehouse onto SPSS Modeler.
  • Work with SPSS nodes to perform the initial Visualizations, pre-processing and Feature Engineering of a given dataset.
  • Store back data directly to Db2 Warehouse.