During the training of machine learning models, data scientists have observed that the model can include some type of bias. Due to the black box nature of the training process, these biases are difficult to root out. The AI Fairness 360 toolkit offers a way to identify and quantify these biases, as well as a path for remediation. This code pattern explains how the AI Fairness 360 toolkit can help you identify and quantify bias in machine learning model training.
A machine learning model makes predictions of an outcome for a particular instance. For example, using a loan application as a use case, you’d want to predict if the applicant will repay the loan. The model makes these predictions based on a training data set, where many other instances (other loan applications) and actual outcomes (whether they repaid) are provided. Thus, a machine learning algorithm will attempt to find patterns, or generalizations, in the training data set to use when a prediction for a new instance is needed. (For example, one pattern it might discover is “if a person has salary > USD 40K and has outstanding debt < USD 5, they will repay the loan.”) In many domains, this technique, called supervised machine learning, has worked very well.
However, sometimes the patterns that are found might not be desirable or might even be illegal. For example, a loan repayment model might determine that age plays a significant role in the prediction of repayment because the training data set happened to have better repayment for one age group than for another. This raises two problems: the training data set might not be representative of the true population of people of all age groups and even if it’s representative, it is illegal to base any decision on an applicant’s age, regardless of whether this is a good prediction based on historical data.
AI Fairness 360 is designed to help address this problem with fairness metrics and bias mitigators. Fairness metrics can be used to check for bias in machine learning workflows. Bias mitigators can be used to overcome bias in the workflow to produce a more fair outcome.
When you have completed this code pattern, you should understand how to:
- Compute a fairness metric on original data using AI Fairness 360
- Mitigate bias by transforming the original data set
- Compute fairness metrics on transformed training data sets
- User interacts with Watson Studio to create a Jupyter Notebook.
- Notebook imports the AI Fairness 360 toolkit.
- Data is loaded into the notebook.
- User runs the notebook, which uses the AI Fairness 360 toolkit to assess fairness of the machine learning model.
Find the detailed steps for this pattern in the README. Those steps will show you how to:
- Clone the repo.
- Run Jupyter Notebooks.
or in Watson Studio:
- Sign up for Watson Studio.
- Create the notebook.
- Run the notebook.