Infuse a loan department platform with AI

There are several steps that are involved in building a Modernizing your bank loan department application. This tutorial explains one of those steps: infusing AI into the platform. After you’ve processed the credit risk data through IBM Watson® Knowledge Catalog to perform customizations like creating data classes, business terms, and masking data, it is then handed over to the data scientist to perform further actions.

As shown in the following image, you’ll use two services offered within IBM Cloud Pak® for Data to build and deploy a machine learning model in this phase. The data scientist creates a model pipeline that includes exploring data, visualizing data, building the machine learning model, running predictions, and evaluating the model using Watson Studio. This model is then deployed within IBM Cloud Pak for Data to be accessed by an external application using Watson Machine Learning services.

Data scientist trains model

Machine learning in Jupyter Notebook

I’ll begin by going through the process of exploring the data set and building a predictive model you can use to determine the likelihood of a credit loan having risk or no risk. For this use case, the machine learning model you’re building is a classification model that returns a prediction of risk (the applicant’s inputs on the loan application predict that there is a good chance of default on the loan) or no risk (the applicant’s inputs predict that the loan will be paid off). I use some fairly popular libraries and frameworks to build the model in Python using a Jupyter Notebook. After I’ve built the model, I make it available for deployment so it can be used by others.

Step 1. Create a project and deployment space

You’ll begin by creating a project and deployment space.

Create a new project

Launch a browser and navigate to your IBM Cloud Pak for Data deployment.

IBM Cloud Pak for Data login

In IBM Cloud Pak for Data, I use a project to collect and organize the resources to achieve a particular goal — for example, resources to build a solution to a problem. Your project resources can include data, collaborators, and analytic assets like notebooks and models.

  1. Go to the hamburger (☰) menu and click Projects. Projects

  2. Click New project +. Start a new project

  3. Select Analytics project for the project type and click Next. Select project type

  4. Create an empty project by selecting Create an empty project. Create empty project

  5. Name the project, and click Create. Click create

Create a deployment space

IBM Cloud Pak for Data uses the concept of deployment spaces to configure and manage the deployment of a set of related deployable assets. These assets can be data files, machine learning models, etc.

  1. Go to the hamburger (☰) menu and click Analyze > Analytics deployments. Hamburger menu analytics deployments

  2. Click + New deployment space. Add new deployment space

  3. Select Create an empty space. Create empty deployment space

  4. Give your deployment space a unique name and optional description, then click Create. You’ll use this space later when you deploy a machine learning model. Create deployment space

Step 2. Load and run the Jupyter Notebook

Load the data set

For the data set, you’ll use the German credit risk data set.

  1. From the project overview page, click Add to project + to launch the Choose asset type window. Notebook Open

  2. Select Data from the options, and upload the .csv file. Add data

Note: If you are continuing this tutorial from the previous one and have created a catalog in Watson Knowledge Catalog, switch to the Catalog tab, select the catalog from the ADD FROM CATALOG drop-down menu, and select the CSV file.

Import catalog

Load the Jupyter Notebook

For the notebook, you’ll use the machinelearning-creditrisk-sparkmlmodel.ipynb. There’s also a copy of the notebook with results saved after running all of the cells within it.

  1. From the project overview page, click Add to project + to launch the Choose asset type window. Choose asset type

  2. Select Notebook from the options and switch to the From file tab. Notebook open

  3. Click Drag and drop files here or upload, upload the machinelearning-creditrisk-sparkmlmodel notebook and click Create notebook to load the Jupyter Notebook.

Run the notebook

A notebook is composed of text (markdown or heading) cells and code cells. The markdown cells provide comments on what the code is designed to do. You run the cells individually by highlighting each cell, then either click Run at the top of the notebook or use the keyboard shortcut to run the cell (Shift + Enter, but this can vary based on the platform). While the cell is running, an asterisk ([*]) shows up to the left of the cell. When that cell has finished running, a sequential number shows up (for example, [17]).

Note: Some comments in the notebook are directions for you to modify specific sections of the code. Perform any changes as indicated before running and executing the cell.

Load and prepare data set

When the Jupyter Notebook is loaded and the kernel is ready, you’re ready to execute it. Click the pencil icon at the upper-right corner to run or edit the notebook.

Notebook loaded

Section 1.0 Install required packages installs some of the libraries are going to use in the notebook (many libraries come pre-installed on IBM Cloud Pak for Data). Note that you upgrade the installed version of the Watson Machine Learning Python client. Look at the output of the first code cell to ensure that the Python packages were successfully installed.

Section 2.0 Load and Clean data loads the data set you’ll use to build the machine learning model. To import the data into the notebook, you use the code-generation capability of Watson Studio.

  1. Highlight the code cell by clicking it. Ensure that you place the cursor below the # Place cursor below and insert the Pandas Dataframe for the Credit Risk Data line.
  2. Click the 10/01 Find data icon in the upper right of the notebook to find the data asset you must import.
  3. Use the german_credit_data.csv file version of the data set you previously imported to the project.
  4. For your data set, click Insert to code and choose Insert Pandas DataFrame. The code to bring the data into the notebook environment and create a Pandas DataFrame is added to the cell below.
  5. Run the cell and you see the first five rows of the data set. Add the data as a Pandas DataFrame

Note: Because you’re using generated code to import the data, you must update the next cell to assign the df variable. Copy the variable generated in the previous cell (it looks like df=data_df_1 or data_df_2) and assign it to the df variable (for example, df=df_data_1).

Continue to run the remaining cells in section 2 to explore and clean the data.

Build machine learning model

In section 3.0 Create a model, the cells run through the steps to build a model pipeline. You split the data into training and test data, encode the categorical string values, create a model using the random forest classifier algorithm, and evaluate the model against the test set. Run all of the cells in section 3 to build the model.

Building the pipeline and model

Save the model

Section 4.0 Save the model saves the model to your project.

  1. Save and deploy the model to the Watson Machine Learning service within the IBM Cloud Pak for Data platform. In the next code cell, be sure to update the wml_credentials variable.

The URL should be the full host name of the IBM Cloud Pak for Data instance, which you can copy from your browsers address bar (for example, https://zen.clustername.us-east.containers.appdomain.cloud). The username and password should be the same credentials that you used to log in to IBM Cloud Pak for Data.

  1. Update the MODEL_NAME and DEPLOYMENT_SPACE_NAME variables. Use a unique and easily identifiable model name:

     MODEL_NAME = "MY_NAME RISK MODEL"
     DEPLOYMENT_SPACE_NAME = "MY_NAME RISK MODEL DEPLOYMENT"
    
  2. Continue to run the cells in the section to save the model to IBM Cloud Pak for Data.

You’ve successfully built and saved a machine learning model programmatically.

Note: Make sure that you stop the kernel of your notebooks when you are done to conserve resources. You can do this by going to the Asset page of the project, selecting the three dots under the Action column for the notebook you have been running, and selecting Stop Kernel from the Actions menu. If you see a lock icon on the notebook, click it to unlock the notebook so you can stop the kernel.

Stop kernel

Conclusion

In this tutorial, you learned how a data scientist can build a classification model to predict the likelihood of a credit loan having Risk or No Risk by using the services available within IBM Cloud Pak for Data. You learned how to build a model using a Jupyter Notebook from Watson Studio, how to save models using the Watson Machine Learning SDK, and more.

This tutorial is part of the Modernizing your bank loan department series. Next, you’ll learn how to use this deployed model through a Flask application and deploy it on an OpenShift® cluster, and how a loan agent and a customer are able to use this application to predict risk.