IMPORTANT: The steps in this tutorial should be able to be run with InstructLab 0.17.1, the most current stable release. Some commands may have changed since this tutorial was published. If you run into issues, explore the InstructLab README file for help.
To contribute to a large language model (LLM) using InstructLab, the easiest approach resembles a pull request in software development, and it entails crafting a skill.
This process requires two essential components: a qna.yaml file and a separate text file that provides attribution for the content. The YAML file contains structured data, which helps organize the information for the model. Think of it as a straightforward text file with structured formatting.
InstructLab uses a selection of these skills to generate a more extensive set of synthetic data related to the provided examples.
However, there’s a limit to the amount of content that the model can process effectively. As a result, contributors should ensure that the question and answer pairs in the qna.yaml file don’t exceed approximately 2300 words. By adhering to this limit, contributors can help maintain the quality and efficiency of the model’s training process.
Let’s start with a simple example. We’ll enhance the model to provide meeting summaries by training it with InstructLab. Using InstructLab, we’ll generate, train, deploy, and evaluate the chat model using the provided qna.yaml file for generating meeting transcript summary data.
Prerequisites
For the initial setup process, please follow the step-by-step instructions outlined in the InstructLab README.
Steps
Contribute skills.
First, create a qna.yaml file containing the meeting transcript summary data, including the minutes of the meeting, attendees, agenda, discussions, and action items. You can view my example qna.yaml file in my GitHub repo.
Optionally, you can provide a separate text file that details the attribution of the content, such as who created it and where it came from.
List and validate your data by running the ilab diff command to list your new data and ensure it’s registered correctly in the taxonomy path.
ilab diff
Show more
Generate a synthetic data set by running the ilab generate command to generate a synthetic dataset based on the newly added skill set in the taxonomy repository.
ilab generate
Show more
This step may take from 15 minutes to 1+ hours to complete, depending on your computing resources.
Train the model locally on Linux/Mac M Series, by running the ilab train command.
ilab train
Show more
This step can potentially take several hours to complete depending on your computing resources. The trained model will be outputted in the models directory.
Test the newly trained model by running the ilab test command to test the model and verify its performance.
ilab test
Show more
Serve the newly trained model. First, you need to stop any existing server by entering ctrl+c keys. Then, convert the newly trained model using the ilab convert command.
#Serve the model locally via
ilab serve --model-path <New model name>
Show more
Chat with the new fine-tuned model using the chat interface by running this command:
ilab chat -m <New model name>
Show more
Submit your contribution!
If you’ve improved the model, open a pull request in the taxonomy repository to include the files with your improved data.
Following these steps will allow you to contribute the meeting transcript summary data to the model and train a new model based on it, enhancing its capabilities in generating meeting minutes.
Summary and next steps
In this tutorial, you learned how to use InstructLab to enhance a model's capabilities in generating meeting summaries. Following meticulous data creation, the model underwent training, deployment, and evaluation stages using InstructLab, aiming to generate concise meeting summaries effectively.
By using these steps with InstructLab, you can train the model well. These steps created a specialized skill designed to enhance the efficiency of generating meeting summaries. Through the meticulous crafting of this skill, you can optimize the process, ensuring that meeting discussions are distilled into clear and concise summaries for improved comprehension and productivity.
To get started, join the InstructLab community in GitHub. You can also explore IBM foundation models from IBM watsonx.ai studio that are designed to support knowledge and skills contributed by the open source community.
The following foundation models support community contributions from InstructLab:
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.