In this learning path, learn how to use Data Prep Kit (DPK) to prepare data for large language model (LLM) applications.
Skill level
This learning path assumes basic Python skills as a prerequisite and uses Google Colab as the cloud-based Jupyter notebook environment.
Estimated time to complete
Approximately 2 hours.
Learning objectives
With this learning path, you learn:
The fundamental concepts and features of Data Prep Kit (DPK) for building LLM applications
The practical aspects of data ingestion
How to extract data from various sources like PDFs, HTML, and code, and convert the data into tokens suitable for LLMs and vector databases
Ethical considerations for data preparation, and how trasnforms like license filtering, hate abuse profanity (HAP) detection, and PII redaction help users in preparing data
How to build DPK transforms and integrate them into the RAG and fine tuning pipelines using DPK
By completing this learning path, you'll learn how to apply your knowledge and skills to real-world data preparation for LLM applications like RAG and fine tuning.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.