Predict loan eligibility using IBM Watson Studio

Loans are the core business of loan companies. The main profit comes directly from the loan’s interest. The loan companies grant a loan after an intensive process of verification and validation. However, they still don’t have the assurance that the applicant will be able to repay the loan with no difficulties.

In this tutorial, build a predictive model to predict if an applicant is able to repay the lending company. You’ll prepare the data using a Jupyter Notebook and then build the model using IBM SPSS Modeler.

Learning objectives

After completing this tutorial, you understand how to:

  • Add and prepare your data
  • Build a machine learning model
  • Save the model

Prerequisites

To complete this tutorial, you need the following:

Estimated time

The overall time of reading and following this tutorial is approximately one hour.

Steps

Data set

The data set is from Analytics Vidhya.

The format of the data:

  • Variable Description
  • Loan_ID Unique loan ID
  • Gender Male/Female
  • Married Applicant married (Y/N)
  • Dependents Number of dependents
  • Education Applicant education (Graduate/Undergraduate)
  • Self_Employed Self-employed (Y/N)
  • ApplicantIncome Applicant income
  • CoapplicantIncome Co-applicant income
  • LoanAmount Loan amount in thousands
  • Loan_Amount_Term Term of loan in months
  • Credit_History Credit History meets guidelines
  • Property_Area Urban/Semi urban/rural
  • Loan_Status Loan approved (Y/N)

Step 1. Create a project in Watson Studio

From the IBM Watson® Studio main page, click New project. Choose Complete to get the full functions. After you enter your project name, click Create.

Create project in Watson Studio

Step 2. Upload the data set to Watson Studio

Open Find and add data from the right pane and drag the data set (.csv file) from your computer to that area.

Upload data set to Watson Studio

Step 3. Create the SPSS modeler flow

  1. On the same Assets page, scroll down to Modeler flows.
  2. Click the (+) New flow icon.
  3. Under the New tab, name your modeler ‘Loan Eligibility Predictive model’.
  4. Click Create.

Step 4. Add and prepare data

  1. Add data to the canvas using the Data Asset node.
  2. Double-click the node, and click Change Data Asset to open the Asset Browser. Select train.csv, then click OK and Save.

    Add and prepare data

    Let’s look into the summary statistics of our data using the Data Audit node.

  3. Drag the Data Audit node, and connect it with the Data Asset node. After running the node, you can see your audit report in the pane on the right.

    View the audit report

    You see that some columns have missing values. Let’s remove the rows that have null values using the Select node.

  4. Drag the Select node and connect it with the Data Asset node. Right-click on the node to open it.

  5. Select discard mode, and provide the following condition to remove rows with null values.

     (@NULL(Gender) or @NULL(Married) or @NULL(Dependents) or @NULL(Self_Employed) or @NULL(LoanAmount) or @NULL(Loan_Amount_Term) or @NULL(Credit_History))
    

    View clean data

Now the data is clean, and you can proceed with building the model.

Step 5. Configure variables type

  1. Drag the Type node to configure the variables type.
  2. Double-click the node or right-click to open it.
  3. Choose Configure Types to read the metadata.
  4. Change the Role from the drop-down menu of [Loan_Status] from input to output.
  5. Change the Role drop-down menu of [LoanID] from none to Record ID.
  6. Click Save.

Configure variable type

Step 6. Build a machine learning model

The model predicts the loan eligibility of two classes (either Y:Yes or N:No). Thus, the choice of algorithms fell into Bayesian networks because it’s known to give good results for predicting classification problems.

  1. Split Data into training and testing sets using the Partition node from the Field Operations palette.

    Double-click the Partition node to customize the partition size into 80:20. Change the ratio in the Training Partition to 80 and Testing Partition to 20.

    Build a machine learning model

  2. Drag the Bayes Net node from the Modeling Palette.

  3. Double-click the node to change the settings. Check Use custom field roles to assign Loan_Status as the target, and all the remaining attributes as input except Partition and Loan_ID. When you finish, click Save.

    Select fields for loan status

  4. Run your Bayesian Network node, then you’ll see your model in an orange-colored node.

    Run Bayesian network node

Step 7. View the model

  1. Right-click the orange-colored node, then click View.
  2. Now you can see the Network Graph and other model information.

    View the model

Step 8. Evaluate the performance of the model

  1. Drag the Analysis node from the Output section, and connect it with the model.
  2. After running the node, you can see your analysis report in the pane on the right.

    Evaluate model performance

The analysis report shows an 82.3% accuracy on the test data set with this model. At the end, you can build more models within the same canvas until you get the result you want.

Step 9. Save the model

  1. Right-click the Bayes Net node, and select Save branch as a model.
  2. Enter a name for the model. A machine learning service should be added automatically if you already created one.
  3. Click Save.

    Save the model

On the Asset page under Watson Machine Learning models, you can access your saved model, where you can deploy it later.

Access saved model

Summary

In this tutorial, you learned how to create a complete predictive model, from importing the data and preparing the data to training and saving the model. You also learned how to use SPSS Modeler and export the model to Watson Machine Learning models.

If you’d like to prepare the data using a Jupyter Notebook and use various models to predict the target variable, take a look at this loan eligibility tutorial and repository.