Data Prep Kit (DPK) accelerates unstructured data preparation for LLM app developers. Using DPK, developers can prepare use-case-specific unstructured data to fine-tune the LLMs, instruct-tune the LLMs, or to build RAG applications for LLMs.
The Data Prep Kit (DPK) offers a diverse set of pre-built transforms to streamline data preparation for AI applications. These transforms cater to various data formats, including text, code, and structured data. These built-in transforms provide a strong foundation for handling common data processing tasks. This rich collection of built-in transforms empowers one to efficiently prepare the data for various AI and machine learning applications.
While Data Prep Kit includes several built-in transforms, you can also build custom data preparation transforms to meet your needs. In this tutorial, learn how to implement a Processing transform that calculates a signature value for a document and stores the signature as part of the metadata associated with the document.
In this tutorial, you learn how to use the Data Prep Kit (DPK) to prepare data for fine-tuning and to achieve the business use case of contract analysis.
Data Prep Kit (DPK) is a scalable, flexible, robust, and easy to use framework for data processing. Data Prep Kit is data agnostic, handling diverse data formats including text, code, and structured data. It uses distributed computing frameworksvlike Ray and Spark to efficiently process large data sets. Developers can createvcustom transforms (to supplement the built-in transforms) to readily address specificvdata processing needs.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.