Remember the last time you were talking to a friend about your favorite movies, going back and forth about your favorite scenes and characters? But then you couldn’t remember which actor it was that played the witty sidekick, so you grabbed your smartphone and did a quick web search to find the answer? That piece of filmography information was a bit beyond what you knew and you needed additional knowledge to get an answer. Virtual conversational interfaces face similar situations. They seek to assist and engage a user, but sometimes don’t have all the knowledge needed to answer a question. Dealing with user inputs that require additional knowledge is what we call the long-tail problem.
When you think of the types of questions or inputs for virtual conversational interfaces, you can categorize them broadly into two groups that we refer to as the short head and the long tail. The long tail refers to a part of the distribution of the user input/question and response from a conversational system. At the head of the distribution are questions or utterances that appear frequently and can be mapped to a defined set of intents, like FAQs or common responses (e.g., what hours are you open?). The long tail contains questions that appear less frequently and may not have an easily defined response. Instead, these questions require looking through a large knowledge base of information for relevant content.
The Watson Conversation service is uniquely suited to deal with these types of inputs from the head of the distribution. It is able to learn from examples and understand natural variations in the way the questions are phrased or presented, and map them to specific intents and entities that can be used to script appropriate responses. However, in some cases the user’s question cannot be confidently classified into an intent and is seeking information beyond what is available in an FAQ or a limited set of common responses (e.g., my exhaust is making a rattling sound, how do I troubleshoot this issue?). These questions are part of the long tail, and can be varied or unique, and therefore, too difficult to build specific intents for.
The Watson Discovery service provides the capabilities needed for effectively retrieving long-tail answers from a corpus of knowledge. You can load a vast knowledge base of documents into the Discovery service, pass on long-tail queries, and return a list of relevant documents or passages to a user. The Discovery service uses an underlying enterprise search engine along with powerful natural language processing enrichments that extract keywords, concepts, entities, and so on to find relevant documents for a particular query. When a long-tail question comes in, the service looks for matching terms across all that data and scores documents based on where and how those matches occur. Discovery can also be trained to find signals from those matches that can lead to improved relevance.
The Conversation with Discovery sample app demonstrates how Watson Conversation and Watson Discovery services can be used together to create a conversational interface that addresses the complete distribution of inputs from a user. The app mimics a car dashboard interface, allowing users to perform certain defined actions that a vehicle might provide like, “turn on lights” or “find the nearest gas station.” The sample app gives you a Watson Conversation workspace that creates the intents, entities, and dialog for these interactions.
The Conversation service is also trained on long-tail questions that map to an out-of-scope intent. Within the application, when the out-of-scope intent is identified, the question is routed to the Watson Discovery service. The Watson Discovery service ingests and enriches a corpus of car manual documents, which can be returned as results in the application for these long-tail questions. The natural language enrichment performed by Discovery on documents helps improve the search by identifying additional information like concepts that may not appear explicitly in the manual but could be referred to by a user question.
For example, a question like, “how do I improve fuel efficiency” can bring back a relevant section of the manual without “efficiency” appearing anywhere in the document. Using this technique of training an out-of-scope intent is one way to support the hand-off for a long-tail question. Another approach is to use the confidence score returned by Conversation. If Conversation does not return an intent with confidence above a certain threshold that you define, the question can be passed on to Discovery. In this way, you can provide high confidence answers through Conversation but still provide a response in cases where an intent cannot be identified.
This sample app shows how Watson Conversation and Watson Discovery together can be used to address a wide range of possible user questions. The pattern can be adapted for use in many applications including customer service issues, product support, employee education, and many other use cases that can benefit from this combination of capabilities.
Check out the IBM Watson Conversation and Discovery app today on GitHub
Starter Kit: Chatbot with Long Tail Search
Try the interactive sample app or fork it on GitHub to begin developing your first conversation discovery application: https://www.ibm.com/watson/developercloud/starter-kits.html#chatbot-with-long-tail-search