Fraud detection with SPSS Modeler
Graphically build and deploy machine learning models to predict fraud by using the SPSS Modeler flow feature in IBM Watson Studio
This tutorial explains how to graphically build and deploy a machine learning model to predict fraud by using the IBM® SPSS® Modeler flow feature in IBM Watson™ Studio. SPSS Modeler flows in Watson Studio provide an interactive environment for quickly building machine learning pipelines that flow data from ingestion to transformation to model building and evaluation, without needing any code.
Before we jump right to the SPSS Modeler, let’s try to understand the problem at hand. In this case, it’s fraud detection, which indicates whether there is a risk that the customer will fraud or not. Now, let’s look the data set and learn how the different attributes correlate to the risk factor.
Understanding the data
The data set that we are using is German credit risk data. Its attributes include credit history, credit amount, purpose, age, and num of dependents. The last column, Result, contains two types of values, 1 (which shows risk) or 2 (which shows no risk).
Segmentation of customers on risk attribute
As shown in the following figure, you see that there are 700 customers with risk and 300 customers with no risk.
Segmentation of customers on number of dependents
You can see also that most of the customers have one dependent on them.
Segmentation of customers on credit history
In the following image, you see that most of the customers have a credit history of A32.
From all of these images, you can see the quality of the data set being analyzed and the different characteristics of the customers.
Now, let’s try to determine how different attributes correlate with the risk factor by doing exploratory analysis.
Age versus Credit Amount
The following figure shows Age and Credit Amount plotted against the Risk Factor. You see that customers, whether young or old, are more likely to be a risk if their credit amount is low. Similarly, you see that as the age of the customer starts increasing, they become less likely to be a risk.
Now, let’s see how you can use SPSS Modeler to predict fraudulent customers.
After completing this tutorial, you understand how to:
- Get started with Watson Studio
- Use SPSS Modeler flow in Watson Studio to build and deploy machine learning models without writing any code
To use this tutorial, you need:
- An IBM Cloud account. The tutorial can be completed using an IBM Cloud Lite account.
- A basic knowledge of machine learning algorithms.
Estimated time to complete
It takes approximately 60 minutes to complete the tutorial. Depending upon your internet connection and the size of your data set, the time might increase.
The SPSS Modeler flow feature works for Watson Studio on IBM Cloud and Watson Studio Desktop. The steps to create an SPSS Modeler flow mentioned below for Watson Studio on IBM Cloud also apply to Watson Studio Desktop. However, for Watson Studio Desktop, you must:
Download and install Watson Studio Desktop. You get a free 30-day trial, which also includes a trial for SPSS Modeler.
Log in using your IBM Cloud credentials. If you don’t have an IBM Cloud account, you can sign up for one.
Start Watson Studio, then continue with Step 2.
To use Watson Studio on IBM Cloud, you must sign in and perform the following steps:
Step 1: Create required services instances
- Object Storage: To store the data, you need a storage service to be linked with your project. Create an IBM Cloud Object Storage service by searching for and selecting it from the IBM Cloud Catalog. Select the lite plan, and click Create.
- Watson Studio: Create a Watson Studio instance by searching for and selecting it from the IBM Cloud Catalog. Select the lite plan, and click Create.
- Watson Machine Learning: Create a Watson Machine Learning instance by searching for and selecting it from the IBM Cloud Catalog. Select the lite plan, and click Create.
After you’ve created the services, start IBM Watson Studio by selecting it from the resource list or by clicking Watson Studio.
Step 2: Create Watson Studio project
Use the following steps to create a Watson Studio project. To begin, download the data set file.
Click Create a project for Watson Studio on IBM Cloud.
Click New project for Watson Studio Desktop.
Click Create an empty project.
Name your project, and click Create.
Click Assets. Browse to the data set file, which is german_credit_data.csv, and upload it. After the file is uploaded successfully, it appears in your Data assets.
Step 3: Create SPSS Model flow
Click Add to project, and select Modeler Flow.
Name your modeler, and click Create to create and start it.
You see the main SPSS Modeler flow screen. From here, you can drag different nodes like Data assets, Data Refinery tools, Machine Learning models, and Output tools like Graphs and Matrices.
Step 4: Build and run the flow
Use the following steps to build the flow.
Click Import, and drag Data Asset into the main screen.
Click Field Operations, and drag Type into the main screen. If not automatically joined, you need to join the two nodes by dragging the right circle of the first node to the left circle of the second node.
Click Modeling, and drag Auto Classifier into the main screen if you do not know which classifier to use for the prediction. Join Type with Auto Classifier as well.
To run the flow, you must first connect the flow with the appropriate set of test data available in your project and select the appropriate target in the Type node.
- Click the three dots that you see when you hover over the data asset node.
Select Open from the menu. This shows the attributes of the node in the right part of the page.
Click Change data asset to change the input file.
Select the german_credit_data.csv file, and click OK to save it.
Select the three dots of the Type node that you see when you hover over it.
- Select Open from the menu. This shows the attributes of the node in the right part of the page. Check whether the attribute Result is categorical and target. If it’s not, then change it.
Click Ok to save.
Click the Run icon, and the SPSS Modeler flow starts running.
Step 5: Evaluate the model
Running the flow creates a Result node. Click the three dots of the Result node to view the model.
This shows the top performing machine learning models. Because it’s the top model, select Random Trees to see how it performed.
This opens an overview section where you can find details about the result such as the accuracy of the model, the confusion matrix, and important features.
The Confusion matrix in the previous figure shows how the model performed while predicting the fraudulent records against the actual fraudulent records. If you would like to get the confusion matrix for the complete data set, you can add a Matrix Output node to the canvas by using the following steps.
- Navigate back to the flow.
Add a Matrix node from the Outputs menu
Attach the Matrix node to the model output node.
Open the Matrix node, and set the target attribute Result in the Rows and the predicted result, that is, $XF-Result in the Columns section. Click Save.
Run the flow again to see the output of the matrix. Running the flow gives you output in the right column. Open Result x $XF-Result.
The following image shows the output of the confusion matrix.
This confusion matrix shows the total number of correct and incorrect predictions that the model has determined. For example, the model has predicted 667 risks to be risks correctly and 33 risks to be non-risks, as shown by the first row.
Step 6: Saving and deploying the Model
Deploying the model feature is not part of Watson Studio Desktop. However, you can download the SPSS Modeler flow stream from Watson Studio Desktop, then import that to Watson Studio on IBM Cloud. You can run it again and create a model that you can deploy using the following steps.
After you create, train, and evaluate a model, you can save and deploy it. To save the SPSS model:
- Navigate to the flow editor for the model flow.
- Select the Matrix node and open its menu by selecting the three dots in the upper-right corner.
- Select Save branch as a model from the menu.
The same outcome can be achieved using the Table node as your output node. In this case, we use the Matrix node.
Provide a name for the model, and click Save.
Now, you can see your model in the current project under the Assets section.
To deploy the SPSS model:
- Click the saved model in the project Models list.
- Select the Deployments tab.
- Click Add Deployment to create a new web service deployment, and provide a name.
- Set the deployment type to Web Service.
Wait until the status shows DEPLOY_SUCCESS.
Step 7: Testing the model
Now, the model is deployed and can be used for prediction. However, before using it in a production environment it might be worthwhile to test it using real data. You can do this interactively or programmatically using the API for the IBM Machine Learning Service. For now, we test it interactively.
The UI provides two options for testing the prediction: by entering the values one by one in distinct fields (one for each feature) or by specifying the feature values using a JSON object.
To test the model at run time:
- Select the deployment that you just created by clicking the deployment name (for example, fraudpredictordeployment). This opens a new page that shows an overview of the properties of the deployment (for example, name, creation date, or status).
- Select the Test tab.
Enter the input values into the form, and click Predict.
The prediction result is given in terms of the probability that the customer will be a risk (True) or not (False).
This tutorial covered the basics of using the SPSS Modeler flow feature in Watson Studio to predict fraud. Using the SPSS Modeler flow feature of Watson Studio provides a non-programming approach to creating a model to predict customer churn.