Program Your Chatbot to Handle “Long-Tail” Questions with Watson Conversation and Watson Discovery


Watson Conversation is now Watson Assistant. Although some illustrations in this tutorial may show the service as Watson Conversation, the steps and processes will still work.

Remember the last time you were talking to a friend about your favorite movies, going back and forth about your favorite scenes and characters? But then you couldn’t remember which actor it was that played the witty sidekick, so you grabbed your smartphone and did a quick web search to find the answer? That piece of filmography information was a bit beyond what you knew and you needed additional knowledge to get an answer. Virtual conversational interfaces face similar situations. They seek to assist and engage a user, but sometimes don’t have all the knowledge needed to answer a question. Dealing with user inputs that require additional knowledge is what we call the long-tail problem.

When you think of the types of questions or inputs for virtual conversational interfaces, you can categorize them broadly into two groups that we refer to as the short head and the long tail. The long tail refers to a part of the distribution of the user input/question and response from a conversational system. At the head of the distribution are questions or utterances that appear frequently and can be mapped to a defined set of intents, like FAQs or common responses (e.g., what hours are you open?). The long tail contains questions that appear less frequently and may not have an easily defined response. Instead, these questions require looking through a large knowledge base of information for relevant content.

Program Your Chatbot to Handle “Long-Tail” Questions with Watson Conversation and Watson Discovery

The Watson Conversation service is uniquely suited to deal with these types of inputs from the head of the distribution. It is able to learn from examples and understand natural variations in the way the questions are phrased or presented, and map them to specific intents and entities that can be used to script appropriate responses. However, in some cases the user’s question cannot be confidently classified into an intent and is seeking information beyond what is available in an FAQ or a limited set of common responses (e.g., my exhaust is making a rattling sound, how do I troubleshoot this issue?). These questions are part of the long tail, and can be varied or unique, and therefore, too difficult to build specific intents for.

The Watson Discovery service provides the capabilities needed for effectively retrieving long-tail answers from a corpus of knowledge. You can load a vast knowledge base of documents into the Discovery service, pass on long-tail queries, and return a list of relevant documents or passages to a user. The Discovery service uses an underlying enterprise search engine along with powerful natural language processing enrichments that extract keywords, concepts, entities, and so on to find relevant documents for a particular query. When a long-tail question comes in, the service looks for matching terms across all that data and scores documents based on where and how those matches occur. Discovery can also be trained to find signals from those matches that can lead to improved relevance.

The Conversation with Discovery sample app demonstrates how Watson Conversation and Watson Discovery services can be used together to create a conversational interface that addresses the complete distribution of inputs from a user. The app mimics a car dashboard interface, allowing users to perform certain defined actions that a vehicle might provide like, “turn on lights” or “find the nearest gas station.” The sample app gives you a Watson Conversation workspace that creates the intents, entities, and dialog for these interactions.

The Conversation service is also trained on long-tail questions that map to an out-of-scope intent. Within the application, when the out-of-scope intent is identified, the question is routed to the Watson Discovery service. The Watson Discovery service ingests and enriches a corpus of car manual documents, which can be returned as results in the application for these long-tail questions. The natural language enrichment performed by Discovery on documents helps improve the search by identifying additional information like concepts that may not appear explicitly in the manual but could be referred to by a user question.

For example, a question like, “how do I improve fuel efficiency” can bring back a relevant section of the manual without “efficiency” appearing anywhere in the document. Using this technique of training an out-of-scope intent is one way to support the hand-off for a long-tail question. Another approach is to use the confidence score returned by Conversation. If Conversation does not return an intent with confidence above a certain threshold that you define, the question can be passed on to Discovery. In this way, you can provide high confidence answers through Conversation but still provide a response in cases where an intent cannot be identified.

Program Your Chatbot to Handle “Long-Tail” Questions with Watson Conversation and Watson Discovery


This sample app shows how Watson Conversation and Watson Discovery together can be used to address a wide range of possible user questions. The pattern can be adapted for use in many applications including customer service issues, product support, employee education, and many other use cases that can benefit from this combination of capabilities.


Check out the IBM Watson Conversation and Discovery app today on GitHub

Starter Kit: Chatbot with Long Tail Search

Try the interactive sample app or fork it on GitHub to begin developing your first conversation discovery application:

2 comments on"Program your chatbot to handle “long-tail” questions with Watson Conversation and Watson Discovery"

  1. Zee Nastalski April 17, 2018

    Great article!
    How do I find a good value of the confidence threshold ?

  2. Ala Garali July 18, 2018

    I’m creating my first Virtual Assistant application with Node.js using IBM Watson Conversation, TTS, STT and Discovery services.
    I would like to know, how can I extract the exact and right answer of a question asked, from FAQs documents (html, docx, pdf, …) already uploaded on my collection on discovery service?

    Until now, I’ve been able to extract some right answers. But most of the questions asked have either a wrong answers or half/some of the correct answer.

    Chatbot domain is the insurance of people, car / motorcycle and home.

    Best regards.

Join The Discussion

Your email address will not be published. Required fields are marked *