Summary

Summary, next steps, and additional resources

By

Aanchal Goyal,

Shahrokh Daijavad

Summary

In this learning path, you got an overview of the Data Prep Kit. The learning path covered:

  • The fundamental concepts and features of Data Prep Kit (DPK) for building LLM applications
  • The practical aspects of data ingestion
  • How to extract data from various sources like PDFs, HTML, and code, and convert the data into tokens suitable for LLMs and vector databases
  • Ethical considerations for data preparation, and how trasnforms like license filtering, hate abuse profanity (HAP) detection, and PII redaction help users in preparing data
  • How to build DPK transforms and integrate them into the RAG and fine tuning pipelines using DPK

Next steps

Explore the Data Prep Kit project in the data-prep-kit repo. If you find it empowers your work, join our growing community by giving us a star!