Data science process pipeline to solve employee attrition

Get the code

Summary

This code pattern is a high-level overview of what to expect in a data science pipeline and the tools that can be used along the way. It starts from framing the business question, to buiding and deploying a data model. The pipeline is demonstrated through the employee attrition problem.

Description

Employees are the backbone of any organization. Its performance is heavily based on the quality of the employees and retaining them. With employee attrition, organizations are faced with a number of challenges:

  1. Expensive in terms of both money and time to train new employees
  2. Loss of experienced employees
  3. Impact on productivity
  4. Impact on profit

The following solution is designed to help address the employee attrition problem. After completing this code pattern, you’ll understand:

  • The Process involved in solving a data science problem.
  • How to create and use Watson Studio instance.
  • How to mitigate bias by transforming the original dataset through use of the AI Fairness 360 (AIF360) toolkit.
  • How to build and deploy the model in Watson Studio using various tools.

The dataset used in the code pattern is supplied by Kaggle and contains HR analytics data of employees that stay and leave. The types of data include metrics such as education level, job satisfactions, and commmute distance.

The data is made available under the following license agreements:

Dataset license details

Flow

flow

  1. Create and login to the IBM Watson Studio.
  2. Upload the jupyter notebook and start running it.
  3. Notebook downloads the dataset and imports fairness toolkit (AIF360) and Pygal data visualization library.
  4. Pandas is used for reading the data and perform initial data exploration.
  5. Matplotlib, Seaborn, Plotly, Bokeh and Pygal (from step-3) are used for visualizing the data.
  6. Scikit-Learn and AIF360 (from step-3) are used for model development.
  7. Use the IBM Watson Machine Learning feature to deploy and access the model to generate employee attrition classification.

Instructions

Get the detailed instructions in the README file. These steps will show you how to:

  1. Create a Watson Machine Learning service instance.
  2. Sign up for the Watson Studio.
  3. Create a new Watson Studio project.
  4. Create the notebook.
  5. Run the notebook.
  6. Save and share.