Article

What is InstructLab for watsonx.ai?

Shape generative AI by making contributions to LLMs in an open and accessible way

By

Suhas Kashyap,

Nick Gagan

InstructLab is an open source project for enhancing large language models (LLMs) that are used in generative AI applications.

What is InstructLab and what does it do?

Created by IBM Research and Red Hat, the InstructLab open source project provides a cost-effective solution for improving the alignment of AI models (language and code models) and opens the doors for those with minimal machine learning experience to contribute. IBM is seeking to democratize AI model development with InstructLab support in watsonx.ai.

LLMs can be proprietary (such as OpenAI’s GPT models) or offer varying degrees of openness around pretraining data and usage restrictions (such as Meta’s Llama models, Mistral AI’s Mistral models, and IBM’s Granite models. Often, AI practitioners need to adapt pretrained LLMs to suit a particular business need, but this process of fine-tuning an LLM can be time-consuming, resource-intensive, and expensive.

InstructLab follows an approach that addresses these limitations. It can enhance an LLM using far less human-generated information and far fewer computing resources than are typically used to retrain a model. And it makes it possible for upstream contributions to continuously make the model better.

InstructLab consists of 3 components:

  • Taxonomy-driven data curation, which is a set of diverse training data curated by humans.
  • Large-scale synthetic data generation using the InstructLab CLI, where the LLM generates new examples based on the seed training data.
  • Iterative, large-scale alignment tuning, where the LLM is retrained based on the synthetic data. InstructLab requires a model training infrastructure, which IBM donates and maintains the infrastructure to frequently retrain the InstructLab project’s enhanced LLMs.

For a richer, fuller description of InstructLab, see Red Hat’s “What is InstructLab?” article.

Discover watsonx.ai

With watsonx.ai, a suite of proprietary and open source LLMs enhanced with InstructLab are available for use.

Inferencing with these models can be done programmatically or in the watsonx.ai Prompt Lab, a tool for experimenting with LLMs and engineering generative AI prompts.

watsonx.ai regularly pulls in model updates from InstructLab, with the full training taxonomy visible to the user. The Prompt Lab offers a view of the training taxonomy with information on the knowledge and skills added to each model along with the respective seed samples used for synthetic data generation.

In the future, watsonx.ai looks to integrate more of the InstructLab workflow to provide further model enhancement capabilities directly in the platform and cater to enterprise use cases.

Summary and next steps

The InstructLab community intends to let anyone shape generative AI by facilitating contributions to LLMs in an open and accessible way.

Join the InstructLab community, or start your free trial of watsonx.ai today!