Using Mistral AI LLMs in watsonx.ai flows engine

In the swiftly evolving landscape of large language models (LLM), choosing the best possible model for your use case and budget can be a complex task. You need to be able to experiment with different models before integrating AI into your application. This tutorial explores how you can use a new developer tool called IBM watsonx.ai flows engine to do exactly that.

This tutorial is the final part in a four-part series about building AI applications using different LLMs. With IBM watsonx flows engine, you can build AI applications on top of different LLMs by using a CLI and SDK (for Python and JavaScript). You can sign up for the free plan today, and get started building AI applications for free.

Prerequisites

To follow this tutorial, you must have watsonx flows engine set up, as described in the Using different LLMs in watsonx.ai flows engine tutorial.

Using Mistral AI LLMs in watsonx flows engine

In this tutorial, learn about the Mistral AI LLMs, and more specifically, the latest model they released, which is Mistral Large 2. With Mistral Large 2, you get access to enhanced capabilities such as a 128,000-token context window, native JSON output, and extensive multilingual support for languages like Chinese, Japanese, and Arabic. It also offers robust code generation capabilities across over 80 programming languages including Python and JavaScript. These features make it highly effective for tasks requiring long text processing, structured data outputs, and multilingual application development, catering to a wide range of coding and software development needs.

The Mistral AI LLMs, including Mistral Large 2, are available in watsonx.ai, which you'll use as the LLM provider in this tutorial. At the end of this tutorial, you are able to use watsonx flows engine with Mistral Large 2 to build several flows such as text completion and chat.

Text completion

The initial use case you delve into is basic text completion, where you pass a natural language question to the LLM and expect a natural language answer in return. For this, you must set up a textCompletion flow as explained in the first tutorial of this series. The prompt template for text completion differs per LLM, and for the Mistral LLMs you can adapt the prompt template to match the desired format for Mistral. You can find the following prompt template format in their documentation.

<s>[INST] Instruction [/INST] Model answer</s>[INST] Follow-up instruction [/INST]

Using the [INST] and [/INST] tags, you can indicate the beginning and the end of the instruction for the LLM. With <s>, you indicate the start of a conversation. This becomes more important when you look at multiturn conversations later on.

The previous format for the prompt template results in the following flow for text completion in watsonx flows engine.

[wxflows.deployment]
flows="""
    textCompletion = templatedPrompt(promptTemplate: "<s>[INST]You're a helpful assistant, answer the following question: {question} [/INST]") | completion(model: textCompletion.model, parameters: textCompletion.parameters)
"""

After running the wxflows deploy command, you can use the SDK to use the new textCompletion flow.

 const flowName = 'textCompletion'
 const question = `Translate the word computer in five different languages, use the following format: [language]: [translation]`
 const result = await model.flow({
     schema,
     flowName,
     variables: {
         question,
         model: "mistralai/mistral-large",
         parameters: {
             stop_sequences: []
         },
     },
 })

This should return a list of five translations of the word computer directly in your terminal.

1. Spanish: ordenador
2. French: ordinateur
3. German: Computer
4. Italian: computer
5. Japanese： コンピューター（konpyita）

By asking Mistral Large to translate the word computer, you can see how it's able to work with different languages including languagues that are character-based such as Japanese.

You can also take the JavaScript SDK (or the version for Python) and integrate into a new or existing application and render the answer in the browser. For example, by building a chat application using the flow you'll create in the next section.

Creating a chat flow

Another popular use case for generative AI is chat, for which you can create a chat flow. The steps in the flows are executed in sequence, and flows don't have any recursion by design. For that reason, you can use the SDK to add recursion by keeping the chat history and executing a new flow that takes the previous chat history. A chat flow looks like the following example.

[wxflows.deployment]
flows="""
    chat = templatedPrompt(promptTemplate: chat.messages) | completion(model: "mistralai/mistral-large", parameters: chat.parameters)
"""

This time, the model is set directly in the flow definition, meaning that the user interacting with the watsonx flows engine endpoint won't be able to override this value. This is helpful when you want to have more control over the models that are being used.

The chat flow also takes a parameter called messages that is passed to the promptTemplate argument. This parameter contains the entire chat history and is stored outside of the flows. Instead, the SDK passes the chat history to the chat flow and you can maintain the state that contains the chat history (for example, the messages) somewhere else.

To call this flow using the JavaScript SDK, you can use the following code block.

 const flowName = 'chat'
 const question = `<s>[INST] I want to learn more about LLMs, please explain it to me like I'm a five year old [/INST]`
 const result = await model.flow({
     schema,
     flowName,
     variables: {
         question,
         model: "mistralai/mistral-large",
         parameters: {
             stop_sequences: []
         },
     },
 })

Here, the question variable uses the conversation format in the prompt template. The question is wrapped in the [INST] and [/INST] tags to indicate what part of the string is the question. The SDK prints the response directly in your terminal, and you can use this response to ask a follow-up question, for which it's important that you include the LLM's answer.

To ask a follow-up question, for example "Give me an example of such a game", it should be wrapped in the same [INST] and [/INST] tags.

<s [INST] I want to learn more about LLMs, please explain it to me like I'm a five year
old [/INST] Sure! Imagine you're playing a game where you have to guess what someone is thinking. An LLM, which stands for Large Language Model, is like a really smart friend who has read lots and lots of books and stories. When you ask it a question, it uses all that knowledge to give you the best answer it can. It's like having a super smart friend who can help you with your homework or tell you interesting stories! </s> [INST] Give me an example of such a game [/INST]

Again, the response of the LLM is printed in your terminal. You can repeat this process as many times as you want until you reach the token limit of the LLM's context window. In the case of Mistral Large, the context window (both input and output) is 128,000 tokens.

As a next step, you want to have the JavaScript (or Python) application that you're calling the flows from to keep the chat history in state. I'll cover how to build an end-to-end chat application using watsonx flows engine in an upcoming tutorial.

Wrapping up

In this concluding part of the series on using different large language models with IBM watsonx.ai flows engine, you explored the advanced features of Mistral Large 2, learning its versatility in handling extensive tasks such as text completion and chat functions through practical examples. As you begin to integrate these models into your applications, remember the importance of adapting the prompt templates to leverage the capabilities of Mistral Large 2.

Want to learn more about watsonx flows engine? Join our Discord community, and let us know what other types of tutorials you'd like to see in the future.