Fraud detection with SPSS Modeler

This tutorial explains how to graphically build and deploy a machine learning model to predict fraud by using the IBM® SPSS® Modeler flow feature in IBM Watson™ Studio. SPSS Modeler flows in Watson Studio provide an interactive environment for quickly building machine learning pipelines that flow data from ingestion to transformation to model building and evaluation, without needing any code.

Before we jump right to the SPSS Modeler, let’s try to understand the problem at hand. In this case, it’s fraud detection, which indicates whether there is a risk that the customer will fraud or not. Now, let’s look the data set and learn how the different attributes correlate to the risk factor.

Understanding the data

The data set that we are using is German credit risk data. Its attributes include credit history, credit amount, purpose, age, and num of dependents. The last column, Result, contains two types of values, 1 (which shows risk) or 2 (which shows no risk).

Image of credit risk data

Segmentation of customers on risk attribute

As shown in the following figure, you see that there are 700 customers with risk and 300 customers with no risk.

Data showing customer with risk and no risk

Segmentation of customers on number of dependents

You can see also that most of the customers have one dependent on them.

Data showing number of dependents

Segmentation of customers on credit history

In the following image, you see that most of the customers have a credit history of A32.

Credit history data

From all of these images, you can see the quality of the data set being analyzed and the different characteristics of the customers.

Exploratory analysis

Now, let’s try to determine how different attributes correlate with the risk factor by doing exploratory analysis.

Age versus Credit Amount

The following figure shows Age and Credit Amount plotted against the Risk Factor. You see that customers, whether young or old, are more likely to be a risk if their credit amount is low. Similarly, you see that as the age of the customer starts increasing, they become less likely to be a risk.

Age versus credit amount plotted

Now, let’s see how you can use SPSS Modeler to predict fraudulent customers.

Learning objectives

After completing this tutorial, you understand how to:

  • Get started with Watson Studio
  • Use SPSS Modeler flow in Watson Studio to build and deploy machine learning models without writing any code

Prerequisites

To use this tutorial, you need:

  • An IBM Cloud account. The tutorial can be completed using an IBM Cloud Lite account.
  • A basic knowledge of machine learning algorithms.

Estimated time to complete

It takes approximately 60 minutes to complete the tutorial. Depending upon your internet connection and the size of your data set, the time might increase.

Steps

The SPSS Modeler flow feature works for Watson Studio on IBM Cloud and Watson Studio Desktop. The steps to create an SPSS Modeler flow mentioned below for Watson Studio on IBM Cloud also apply to Watson Studio Desktop. However, for Watson Studio Desktop, you must:

  1. Download and install Watson Studio Desktop. You get a free 30-day trial, which also includes a trial for SPSS Modeler.

  2. Log in using your IBM Cloud credentials. If you don’t have an IBM Cloud account, you can sign up for one.

  3. Start Watson Studio, then continue with Step 2.

To use Watson Studio on IBM Cloud, you must sign in and perform the following steps:

Step 1: Create required services instances

  • Object Storage: To store the data, you need a storage service to be linked with your project. Create an IBM Cloud Object Storage service by searching for and selecting it from the IBM Cloud Catalog. Select the lite plan, and click Create.

IBM Cloud Object Storage page

  • Watson Studio: Create a Watson Studio instance by searching for and selecting it from the IBM Cloud Catalog. Select the lite plan, and click Create.

Watson Studio cloud page

  • Watson Machine Learning: Create a Watson Machine Learning instance by searching for and selecting it from the IBM Cloud Catalog. Select the lite plan, and click Create.

Watson Machine Learning page

After you’ve created the services, start IBM Watson Studio by selecting it from the resource list or by clicking Watson Studio.

Step 2: Create Watson Studio project

Use the following steps to create a Watson Studio project. To begin, download the data set file.

  1. Click Create a project for Watson Studio on IBM Cloud.

    Welcome screen

    Click New project for Watson Studio Desktop.

    Studio Desktop new project selection

  2. Click Create an empty project.

    Create a new project screen

  3. Name your project, and click Create.

    Naming the project screen

  4. Click Assets. Browse to the data set file, which is german_credit_data.csv, and upload it. After the file is uploaded successfully, it appears in your Data assets.

    Selecting the Modeler Flow asset type

Step 3: Create SPSS Model flow

  1. Click Add to project, and select Modeler Flow.

    Selecting assets

  2. Name your modeler, and click Create to create and start it.

    Naming and starting the modeler flow

You see the main SPSS Modeler flow screen. From here, you can drag different nodes like Data assets, Data Refinery tools, Machine Learning models, and Output tools like Graphs and Matrices.

SPSS Modeler flow screen

Step 4: Build and run the flow

Use the following steps to build the flow.

  1. Click Import, and drag Data Asset into the main screen.

  2. Click Field Operations, and drag Type into the main screen. If not automatically joined, you need to join the two nodes by dragging the right circle of the first node to the left circle of the second node.

  3. Click Modeling, and drag Auto Classifier into the main screen if you do not know which classifier to use for the prediction. Join Type with Auto Classifier as well.

    Main screen showing nodes

  4. To run the flow, you must first connect the flow with the appropriate set of test data available in your project and select the appropriate target in the Type node.

    1. Click the three dots that you see when you hover over the data asset node.
    2. Select Open from the menu. This shows the attributes of the node in the right part of the page.

      Selecting open

    3. Click Change data asset to change the input file.

    4. Select the german_credit_data.csv file, and click OK to save it.

      Selecting data assets

    5. Select the three dots of the Type node that you see when you hover over it.

    6. Select Open from the menu. This shows the attributes of the node in the right part of the page. Check whether the attribute Result is categorical and target. If it’s not, then change it.
    7. Click Ok to save.

      Type window

    8. Click the Run icon, and the SPSS Modeler flow starts running.

      Running Modeler flow

Step 5: Evaluate the model

Running the flow creates a Result node. Click the three dots of the Result node to view the model.

Result node

This shows the top performing machine learning models. Because it’s the top model, select Random Trees to see how it performed.

Selecting Random Trees model

This opens an overview section where you can find details about the result such as the accuracy of the model, the confusion matrix, and important features.

Confusion matrix

The Confusion matrix in the previous figure shows how the model performed while predicting the fraudulent records against the actual fraudulent records. If you would like to get the confusion matrix for the complete data set, you can add a Matrix Output node to the canvas by using the following steps.

  1. Navigate back to the flow.
  2. Add a Matrix node from the Outputs menu

    Adding a Matrix node

  3. Attach the Matrix node to the model output node.

    Attaching the Matrix node

  4. Open the Matrix node, and set the target attribute Result in the Rows and the predicted result, that is, $XF-Result in the Columns section. Click Save.

    Setting Result

  5. Run the flow again to see the output of the matrix. Running the flow gives you output in the right column. Open Result x $XF-Result.

    Opening Result

The following image shows the output of the confusion matrix.

Result node

This confusion matrix shows the total number of correct and incorrect predictions that the model has determined. For example, the model has predicted 667 risks to be risks correctly and 33 risks to be non-risks, as shown by the first row.

Step 6: Saving and deploying the Model

Deploying the model feature is not part of Watson Studio Desktop. However, you can download the SPSS Modeler flow stream from Watson Studio Desktop, then import that to Watson Studio on IBM Cloud. You can run it again and create a model that you can deploy using the following steps.

After you create, train, and evaluate a model, you can save and deploy it. To save the SPSS model:

  1. Navigate to the flow editor for the model flow.
  2. Select the Matrix node and open its menu by selecting the three dots in the upper-right corner.
  3. Select Save branch as a model from the menu.
  4. The same outcome can be achieved using the Table node as your output node. In this case, we use the Matrix node.

    Save branch as model image

  5. Provide a name for the model, and click Save.

    Save model window

Now, you can see your model in the current project under the Assets section.

FraudDetection model

To deploy the SPSS model:

  1. Click the saved model in the project Models list.
  2. Select the Deployments tab.
  3. Click Add Deployment to create a new web service deployment, and provide a name.
  4. Set the deployment type to Web Service.
  5. Click Save.

    Fraud detector model

  6. Wait until the status shows DEPLOY_SUCCESS.

Step 7: Testing the model

Now, the model is deployed and can be used for prediction. However, before using it in a production environment it might be worthwhile to test it using real data. You can do this interactively or programmatically using the API for the IBM Machine Learning Service. For now, we test it interactively.

The UI provides two options for testing the prediction: by entering the values one by one in distinct fields (one for each feature) or by specifying the feature values using a JSON object.

To test the model at run time:

  1. Select the deployment that you just created by clicking the deployment name (for example, fraudpredictordeployment). This opens a new page that shows an overview of the properties of the deployment (for example, name, creation date, or status).
  2. Select the Test tab.
  3. Enter the input values into the form, and click Predict.

    First prediction screen

The prediction result is given in terms of the probability that the customer will be a risk (True) or not (False).

Conclusion

This tutorial covered the basics of using the SPSS Modeler flow feature in Watson Studio to predict fraud. Using the SPSS Modeler flow feature of Watson Studio provides a non-programming approach to creating a model to predict customer churn.