Data analysis, model building, and deploying with Watson Machine Learning with notebook

This pattern is part of the Getting started with IBM Cloud Pak for Data learning path.


Summary

In this Code Pattern, we’ll use IBM Cloud Pak for Data to go through the whole data science pipeline to solve a business problem and predict customer churn using a Telco customer churn dataset. IBM Cloud Pak for Data is an interactive, collaborative, cloud-based environment. It can help data scientists, developers, and others interested in data science to use tools to collaborate, share, and gather insight from their data–as well as build and deploy machine learning, and deep learning models.

Description

Customer churn (when a customer ends their relationship with a business) is one of the most basic factors in determining the revenue of a business. You need to know which of your customers are loyal and which are at risk of churning, and you need to know the factors that affect these decisions from a customer perspective. This code pattern explains how to build a machine learning model and use it to predict whether a customer is at risk of churning. This is a full data science project, and you can use your model findings for prescriptive analysis later or for targeted marketing.

After you’ve completed this Code Pattern, you’ll understand how to:

  • Use Jupyter Notebooks to load, visualize, and analyze data
  • Run Notebooks in IBM Cloud Pak for Data
  • Build, test, and deploy a machine learning model using Spark MLib on IBM Cloud Pak for Data.
  • Deploy a selected machine learning model to production using IBM Cloud Pak for Data
  • Create a front-end application to interface with the client and start consuming your deployed model.

Flow

flow

  1. User loads the Jupyter notebook into the IBM Cloud Pak for Data platform.
  2. Telco customer churn data set is loaded into the Jupyter Notebook, either directly from the github repo, or as Virtualized Data after following the Data Virtualization Tutorial from the Getting started with Cloud Pak for Data learning path.
  3. Preprocess the data, build machine learning models and save to Watson Machine Learning on IBM Cloud Pak for Data.
  4. Deploy a selected machine learning model into production on the IBM Cloud Pak for Data platform and obtain a scoring endpoint.
  5. Use the model for credit prediction using a frontend application.

Instructions

Ready to put this code pattern to use? Complete details on how to get started running and using this application are in the README.

Conclusion

This code pattern showed how to use IBM Cloud Pak for Data and go through the whole data science pipeline to solve a business problem and predict customer churn using a Telco customer churn dataset. The code pattern is part of the Getting started with IBM Cloud Pak for Data learning path. To continue the series and learn more about IBM Cloud Pak for Data, take a look at the next code pattern, Monitoring the model with Watson OpenScale.

Scott D’Angelo
Steve Martinelli