The documentation for product version () you are trying to access is NOT yet available. This page displays the documentation for the current available version (1.0.17).
By IBM Research
Suite of metrics to assess data quality for building classification applications, built by IBM Research AI.
Data practitioners spend a considerable amount of time in iterative pre-processing of data before it is considered to be of adequate quality for downstream machine learning tasks. Although time consuming, pre-processing is an essential step because the quality of training data directly impacts the complexity as well as accuracy of AI models. Getting insights into the quality of data before it enters a machine learning pipeline can significantly reduce model building time, streamline data preparation efforts and improve the overall reliability of the AI pipeline.
The Data Quality for AI is an integrated toolkit that provides various data profiling and quality estimation metrics to assess the quality of ingested data in a systematic and objective manner. These metrics quantify data issues as a score between 0 and 1, where 1 indicates no issues were detected.
Currently, these metrics are for tabular datasets and accept the input in the form of a comma separated value file. We are working on extending to more metrics. The step by step developer guides are available on IBM Developer's Learning Path page. For any queries, issues and suggestions, please reach out to us on our Slack workspace Data Quality for AI. Feel free to engage with others to ask questions and share. Request an invitation to join our community.