Digital Developer Conference on Data and AI: Essential data science, machine learning, and AI skills and certification Register for free

IBM Developer Blog

Follow the latest happenings with IBM Developer and stay in the know.

Learn a common methodology for training the machine learning models powering your chatbot solution.


With the majority of consumers spending significant time on various messaging platforms, brands are turning to these messaging platforms to better interact with consumers. The increase in private messaging between customers and brands is driving companies to turn to chatbots for improved social customer care.

The IBM Watson Assistant Service offers a simple, scalable and science-driven solution for developers to build powerful chat bots to address the needs of various brands and companies. As developers leverage Watson Assistant to build cognitive solutions for various, one recurring question is: “How much time should I plan to train my solution” or “How do I know when my model is trained sufficiently well”?

While the answer depends greatly on the problem being solved and the data powering the solution, in this blog we offer a common methodology for training the machine learning (ML) models powering your chat bot solution.

How machine learning works

The Watson services rely on a variety of machine learning algorithms, most of which fall in the supervised machine learning category, which learn the specifics of the problem from sample labeled data and help make predictions on unlabeled data. Training a supervised machine learning system involves providing it with representative inputs and corresponding outputs and the system will learn by example. These pairs of representative inputs/outputs constitute the “groundtruth” from which the system learns.

For example, the Watson Natural Language Classifier service employs deep learning technologies to extract intent from a natural language utterance. Training NLC would require providing a groundtruth which includes representative utterances (input) and the corresponding intents (output). NLC would then learn which utterances map to which intents. Note that it not only will be able to extract intent from utterances it has seen but it can also extract intent from any utterance based on similarity of such an utterance to what is available in the training data.

Similarly, the Watson Assistant Service, which is very popular for developing virtual agents and chat bots, also consists of an intent definition/training process. This process maps a short text (utterance) the user provides (types, speaks, …) to the intent(s) the user means.

Ten steps for training your Watson chatbot

In what follows, we outline a recommended approach for training and evaluating the performance of your chat bot (or a general cognitive solution) as illustrated in Figure 1:

  1. Define the intents (also known as classes or categories) you’d like your chat bot to extract from natural language utterances. While you can define a large number of intents for a variety of reasons, it is best to focus your intent definition on the purpose of your chat bot. For example, in an IT support scenario, common intents would include PasswordReset, PasswordChange, AppStart, CheckProcess, and so on. However, if your bot takes the same action for PasswordReset and PasswordChange (maybe pass to a human agent), then you probably want to define one intent PasswordIssues instead.

  2. Collect real end-user utterances that you’d want Watson Assistant to map to intents. It is important that the utterances come from end-users. Trying to guess what end-users would say may be acceptable for initial setup but you should plan to collect and leverage real end-user utterances. The performance of the system is strongly a function of how accurately it captures real end-user utterances. The more realistic the training data is, the better is the performance of the system. For example, when training the Watson Business Coach application, we interviewed sellers, partners, and clients to collect questions and utterances like: “show me a Watson demo in Healthcare, “how can I use cognitive to improve customer service”, “how is cognitive different from analytics”, etc. You can use various techniques for collecting real end-user utterances such as crowd-sourcing or leveraging historical chat logs.

  3. Assign the utterances collected in step 2 to the different intents defined in step 1. This step will most likely require subject matter experts (SMEs) to help with this mapping. For utterances that don’t clearly map to any of the intents defined, either leave those empty (no intent) or map them to “other” (or “offtopic”) intent. It is important to be able to capture intents that are “offtopic” so the application can handle them adequately.

  4. Randomly divide the utterances in step 3 into two sets, a training set and a test set. A 70% training and 30% test is a typical split.

  5. Train your chatbot (Watson Assistant Service intents or an NLC classifier) using the training set from step 4 (or step 7). This training set would constitute the “groundtruth” for the system.

  6. After training is complete, run the test set against the trained classifier and collect performance metrics such as accuracy, precision, and recall. For details and sample code on this, check out this blog.

  7. Perform Error Analysis: review the results in step 6 to understand why the classifier missed certain utterances. Update your training data accordingly and go back to step 5.

  8. After you’re satisfied with the results produced by the trained system, the system is now ready to be released (alpha/beta). Instrument in your solution a mechanism to collect end-user feedback. Some examples of collecting end-user feedback include prompting the user with “thumbs-up”/”thumbs-down” or with a star-rating for relevance of returned results.

  9. When your chatbot is in use, continue to collect end user utterances, the intents returned by trained Watson Assistant service as well as end-user feedback.

  10. Map results collected in step 9 to new training/test data. Go back to step 4 and iterate.

Chatbot training

When training a cognitive system, it is important to note that training/learning is an iterative process. The outlined process makes sure the newly trained system has learned some of the nuances of utterances it may not have captured initially.

As the proliferation of chatbots increases through various applications and platforms, the success of these chatbots is heavily dependent on the experience they offer to end users. There are multiple factors to consider for a positive user experience, an important one being the ability to understand the user’s needs and respond accordingly. Following the outlined steps in this blog will help improve your chatbot’s understanding of user’s input and guide your chatbot to a better user experience. Join us soon in Part 2 of this chatbot series to learn how to evaluate the machine learning models used by your chatbot!