Within a bank’s consumer lending department, a customer’s application for a loan undergoes a lot of scrutiny before a decision of approval or rejection is made. In that process, a loan agent or customer representative must manually assess the information provided by the applicant, which includes information such as credit history, savings, loan duration, loan amount, and housing information. This information is then compared to the vast amount of historical data of similar applicants and analyzed to see whether there was a risk involved when their loans were approved or rejected. This evaluation process can take a while, which opens the possibility of the bank losing a potential customer to another bank.
To reduce the decision-making time and to increase the accuracy of the decisions being made, an increasing number of banks have begun to use machine learning-based solutions. This modernized approach enables a customer representative to make predictions about a loan application with a click of a button.
In this case study, we show you how to predict the risk of a loan application using the following products:
- IBM Cloud Pak for Data
- IBM Watson Knowledge Catalog
- IBM Watson Studio
- IBM Watson Machine Learning
- IBM Watson OpenScale
- Red Hat OpenShift
The solution accesses data across multiple data sources like Netezza Performance Server or IBM Db2 Warehouse. This case study uses IBM Cloud as the cloud platform, but because IBM Cloud Pak for Data runs on many cloud platforms, you can use a different cloud platform including Amazon Web Services (AWS) or Microsoft Azure.
Analyzing credit risk with IBM Cloud Pak for Data on Red Hat OpenShift
Building this application involves the following steps:
- Set up IBM Cloud Pak for Data on Red Hat OpenShift
- Manage and secure clients’ data
- Develop and deploy a credit risk model
- Deploy the loan application on Red Hat OpenShift
- Monitor the machine learning model
- ModelOps cycle
Set up IBM Cloud Pak for Data on OpenShift
We use various services that are available within IBM Cloud Pak for Data to analyze data, build, deploy, and monitor the machine learning model. IBM Cloud Pak for Data is a data and AI platform that runs on a Red Hat OpenShift Kubernetes Container.
You can install IBM Cloud Pak for Data through a tile in the IBM Cloud catalog.
Manage and secure clients’ data
Collecting and organizing data is a foundational step in building a machine learning pipeline. To make data accessible securely, to collect it from multiple sources, and to visualize data, IBM Cloud Pak for Data offers services such as data virtualization and data refinery.
The data collected for analyzing risk predictions contains sensitive personal information like social security numbers, and compliance and security standards must be maintained. To handle these policy, security, and compliance factors and to govern data, IBM Cloud Pak for Data offers a service called IBM Watson Knowledge Catalog.
A data steward or an administrator typically works within the IBM Watson® Knowledge Catalog to mask sensitive information, form rules and terms that are applicable to the banking domain, and to ensure data security. You can use the Implement data governance to manage and secure clients’ data tutorial to learn the steps of how to work within Watson Knowledge Catalog for the credit risk data set.
Develop and deploy a credit risk model
The next step in modernizing the bank loan department is to build a binary classification model that predicts whether there is a risk involved with a particular application. To build this model, the curated data received from the previous step is used as training data. To enable a data scientist to build this model pipeline, IBM Cloud Pak for Data offers the Watson Studio service.
Within Watson Studio, creating this model pipeline can be implemented in two ways:
Write Python code within a Jupyter Notebook. This approach to building the credit risk model is explained in the Infuse a loan department platform with AI tutorial.
Run Watson AutoAI to generate multiple pipelines from which the best pipeline is chosen. The Generate machine learning model pipelines to choose the best model for your problem tutorial discusses the Watson AutoAI approach in detail.
After the model is built, you then use the Watson Machine Learning service available within IBM Cloud Pak for Data to deploy these models so that it can be used from outside of the environment. Both of these approaches explain how this deployment is done, and the model is then available outside of IBM Cloud Pak for Data as a RESTful service.
Deploy the loan application
After the data scientist builds a model and makes it available, an application developer creates a web application. The code pattern Create a web-based intelligent bank loan application for a loan agent discusses how a sample Flask application that invokes the deployed credit model is deployed in an OpenShift cluster to Cloud Foundry on IBM Cloud or locally.
The customer representative can then use this web application to submit an applicant’s details and get a result returned. The result will be either NO RISK, which means that the customer’s loan can be approved or RISK, which means that there is a risk involved in approving the loan. With a RISK result, the customer representative can adjust parameters such as the loan amount and check whether a loan can then be approved.
Monitor the machine learning model
Let’s assume that a loan customer had applied for a loan through a bank that uses this modernized application. If the application suggests that this customer’s loan cannot be approved, the customer has the right to know why the loan was rejected.
The customer can talk to the customer representative to get an explanation as to why the loan was rejected. The customer representative can look at the application and can sometimes make a few guesses such as the loan amount requested being too high or a poor credit score. But the representative can only make guesses. Sometimes, it can be that the machine learning model predictions are inaccurate.
The customer representative can then take this case back to the data scientist that worked on building the model. The data scientist needs to be able to explain why their data model is generating the results that it does.
If the data scientist is not able to derive an explanation and prove that the model generates fair outcomes, the data scientist needs to discover where the AI models are weak and know where to make improvements.
In an effort toward removing the perception that AI modeling is a “black box,” IBM Watson OpenScale helps explain AI outcomes like analyzing banking applications for bias.
Watson OpenScale tracks and measures outcomes from your AI models, and helps to ensure that they remain fair, explainable, and compliant wherever your models were built or are running. Watson OpenScale is designed as an open platform that operates with various model development environments and various open source tools, including TensorFlow, Keras, SparkML, Seldon, Amazon SageMaker, and Azure Machine Learning.
Watson OpenScale provides a set of monitoring and management tools that help you build trust and implement control and governance structures around your AI investments:
- It provides production monitoring for compliance and safeguards (such as auditing model decisions and detecting biases).
- It ensures that models are resilient to changing situations (drift).
- It aligns model performance with business outcomes (performance, accuracy).
The Monitor model drift with Watson OpenScale tutorial explains how drift in data or drift in model accuracy can be monitored for this use case.
So far, we discussed how some services available within IBM Cloud Pak for Data can be used individually to develop different parts of the application. In the following diagram, we show you how all of the different users, technologies, and tools come together in operationalizing end-to-end development and maintenance of this smart application.
Putting it all together, the data steward (not shown in the diagram) works with Watson Knowledge Catalog to prepare data that is then retrieved through Watson Studio. The data scientist then builds the machine learning model by using the prepared data for training. The data scientist deploys it using Watson Machine Learning. The application developer builds an app that invokes this model internally to get results and makes it available to the customer representative. Watson OpenScale monitors this model for explainability, bias, and drift. Because the machine learning model continuously learns through requests sent to it and the feedback received, the processes within IBM Cloud Pak for Data is cyclic.