Data science process pipeline to solve employee attrition
Build a learning model to analyze a shrinking workforce
This code pattern is a high-level overview of what to expect in a data science pipeline and the tools that can be used along the way. It starts from framing the business question, to buiding and deploying a data model. The pipeline is demonstrated through the employee attrition problem.
Employees are the backbone of any organization. Its performance is heavily based on the quality of the employees and retaining them. With employee attrition, organizations are faced with a number of challenges:
- Expensive in terms of both money and time to train new employees
- Loss of experienced employees
- Impact on productivity
- Impact on profit
The following solution is designed to help address the employee attrition problem. After completing this code pattern, you’ll understand:
- The Process involved in solving a data science problem.
- How to create and use Watson Studio instance.
- How to mitigate bias by transforming the original dataset through use of the AI Fairness 360 (AIF360) toolkit.
- How to build and deploy the model in Watson Studio using various tools.
The dataset used in the code pattern is supplied by Kaggle and contains HR analytics data of employees that stay and leave. The types of data include metrics such as education level, job satisfactions, and commmute distance.
The data is made available under the following license agreements:
Dataset license details
|Employee Attrition Data – Database License||Open Database License (ODbL)||Kaggle|
|Employee Attrition Data – Content License||Database Content license (DbCL)||Kaggle|
- Create and login to the IBM Watson Studio.
- Upload the jupyter notebook and start running it.
- Notebook downloads the dataset and imports fairness toolkit (AIF360) and Pygal data visualization library.
- Pandas is used for reading the data and perform initial data exploration.
- Matplotlib, Seaborn, Plotly, Bokeh and Pygal (from step-3) are used for visualizing the data.
- Scikit-Learn and AIF360 (from step-3) are used for model development.
- Use the IBM Watson Machine Learning feature to deploy and access the model to generate employee attrition classification.
Get the detailed instructions in the README file. These steps will show you how to:
- Create a Watson Machine Learning service instance.
- Sign up for the Watson Studio.
- Create a new Watson Studio project.
- Create the notebook.
- Run the notebook.
- Save and share.