In this learning path, you got an overview of the Data Prep Kit. The learning path covered:
The fundamental concepts and features of Data Prep Kit (DPK) for building LLM applications
The practical aspects of data ingestion
How to extract data from various sources like PDFs, HTML, and code, and convert the data into tokens suitable for LLMs and vector databases
Ethical considerations for data preparation, and how trasnforms like license filtering, hate abuse profanity (HAP) detection, and PII redaction help users in preparing data
How to build DPK transforms and integrate them into the RAG and fine tuning pipelines using DPK
Next steps
Explore the Data Prep Kit project in the data-prep-kit repo. If you find it empowers your work, join our growing community by giving us a star!
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.