Taxonomy Icon

Artificial Intelligence

Users love chatbots because they are very simple and minimalist—they can be as simple as a threaded text conversation. Users also prefer to stay right inside their favorite messaging app. They want to get straight to the point without having to navigate apps, web URLs, menus, buttons, ads, chromes, and other elements. But, this simplicity also presents a big design challenge. The chatbot must precisely understand what the user is saying and act appropriately. That is a very tall order even for today’s best natural language artificial intelligence (AI).

With the current state of AI, a threaded text conversation or conversational UI (CUI) is almost always inferior to a well-designed graphical UI (GUI). Compared with the GUI, the CUI is still in its infancy. As a community, we are still exploring design patterns and best practices of the CUI. In this tutorial, I’ll explain why chatbots can fail and what can be done to make them successful.

Note: Dr. Robert Kosara recently published a blog post titled “The Personified User Interface Trap.” He described the Shneiderman and Maes Debate of 1997 on “direct manipulation” compared to “interface agent” styles of UIs. He concluded that the lessons from 20 years ago still apply today: unless the “agent” can perfectly anticipate the user’s needs (perfect AI), presenting the user a GUI window is better, because the information is more discoverable and the interaction requires no “syntax.” Yet, the users’ strong preference for a minimalist UI inside their favorite messaging application cannot be ignored.

Reasons chatbots fail

After Facebook launched its bot platform in April 2016, many people tried the featured “launch partners” and found them to be very lacking. The chatbots could not even understand basic questions that are well within their application domains (for example, checking weather or flower delivery). It was especially painful to watch the chatbots implode when the user tries to talk “naturally” and deviate from the robotic questions the chatbots expect.

Mikhail Larionnov, a Facebook Messenger product manager, reviewed many chatbots on the Facebook platform, and identified three reasons for some chatbots’ lack of traction:

  • Poor onboarding with very little explanation on what the chatbot does
  • Unclear value by trying to do too much within a single chatbot
  • Relying too heavily on natural language processing

Ways to make a successful chatbot

However, Larionnov also gave some concrete advice to fix those problems.

First, a chatbot should have a very limited scope. It should provide value on a narrow subject and do it well. More importantly, it should be able to explain what it does in a sentence or two.

Second, you should use each messaging platform’s build-in capability to convey the chatbot’s features during the user onboarding process. For Facebook Messenger, it is in the form of a well-crafted greeting window and call to action. For Slack, it is the description in the bot store.

Third, you should use structured buttons as much as possible. If you do need to take free-form user input, you need to handle cases when the AI cannot understand the input, and provide help messages on how to correctly use the syntax. The syntaxes for a chatbot are commands and keywords to trigger actions.

I would add a fourth item, you must not spam. Your chatbot must understand and respond to commands such as stop, unsubscribe, and cancel. And, it should immediately stop sending messages. If your bot sends the user excessive and unwanted messages, the user will have no choice but to block your chatbot. Facebook is known to take your chatbot offline if 4% of users block your chat bot.

Designing a command and control CUI

The CUI often reminds people of the good old computer command line from DOS and UNIX days. In fact, when chatbots first become popular, they were widely used by developers to perform highly technical tasks. For example, the HUBOT by GitHub used commands internally to manage many operations. In that use case, the “conversation” was often developers typing commands directly into the chat window, and the bot executing those commands. Does this mean that it is acceptable to build a chatbot that responds to predefined commands the user must memorize? The answer is yes in many cases.

Let’s look at a couple of use cases:

  • Chatbots for work and productivity can be more tolerant of commands. Those chatbot users are typically computer savvy professionals, and they are used to commands at work. In fact, many power users have already mastered the command-line prompt and keyboard shortcuts to get around the slow GUI for repeat tasks. So, a highly efficient “command line” bot is probably preferred over a chatty bot in this scenario.
  • Younger users that grow up with computers use text messaging their entire lives. They are more likely to value the efficiency of commands over a slow GUI.

If you design a command and control CUI, here are some specific design considerations:

  • Provide auto-complete or other forms of help to ensure that the user spells the commands correctly. Pay special attention to iOS and Android devices’ auto-correct features because they might alter the user input if your command is spelled unconventionally. A good example of built-in command help is Slack’s Slash commands. The Slack application UI suggests the correct spelling and provides an explanation of the commands as you type.
  • Program your chatbot to be tolerant of common misspellings or synonyms. For example, “help” can also be “How do I do this?”. This scenario entails creating a list of synonyms, many of them might be regular expressions, and matching all of them against user input at run time.

Designing a short conversation for slot filling

The opposite of the command line is a chatbot that can hold a natural conversation with the user. However, given the current state of natural language AI, it is largely impossible to have a free-ranging conversation with users. And you don’t need to. For a chatbot to be useful, it often requires only highly scripted conversation. For example, to ask your chatbot about weather, you might say:

What's the weather in Austin, Texas, tomorrow?

Notice that here you have:

  • Intent: weather report
  • Location: Austin, Texas
  • Time: tomorrow

The chatbot can now query the weather. But at times, the user does not give all the information at once. If the user starts with the intent alone, the chatbot should be able to ask the user to complete other required parameters, which are called slots. Here is what the conversation might look like:


Human: What is the weather?
Bot: I will look up weather for you. Do you want to check your local weather or somewhere else?
Human: Somewhere else
Bot: Okay, where is it?
Human: Austin Texas
Bot: Thanks. Do you want to know the weather right now or do you want a forecast?
Human: Forecast
Bot: Okay, when?
Human: Tomorrow
Bot: The weather tomorrow at Austin, Texas, is sunny with a high of 80 degrees and low

You get the idea. This type of highly scripted conversation can happen in many scenarios. In general, when you ask the bot to perform a task, the bot should be able to have a short conversation to complete all the required slots.

Tools are available to help you carry those conversations. For example, the IBM Watson Assistant service can support slot filling within a conversation script.

Designing for interactive conversation

Commands and slot filling conversations are effective ways to address the lack of true conversational AI. But there is another more direct approach – how about displaying buttons after messages that require specific responses? This way, you clearly communicate the anticipated response from the user. Here’s an example of buttons inside a Facebook Messenger chat session.

Screen shot of buttons on a gui.

Without those buttons, the chatbot typically gives numbered or keyword options (for example, “Reply 1 to read more and 2 to see all stories”). This type of interaction can often feel robotic. Regardless, when the chatbot asks the user to select from a numbered list, it should be able to handle user input such as 1, one, first, the first choice, and other synonyms.

Remember that the user might not use the buttons or a numbered list you provide. The user can skip the buttons and type in something else altogether. A common example is when the user sees a set of buttons, does not know how to respond, and types help or menu. Your chatbot should be prepared for that scenario by parsing any free-form response to see whether the user made a selection as you requested or the user wanted to start a new task altogether.

Of course, if you have buttons, you can also have other interactive elements. For example, Facebook Messenger supports a carousel that can be horizontally scrolled. It is a great way to present a list of items without clogging up the entire window with a very long list.

Screen shot showing different options

Anticipate multimodal user inputs

A great feature of modern messaging applications is the ability to send images, audio, video, emojis, location, and other inputs. Facebook Messenger is a great example for such rich chat features.

Message entry field at the bottom of a Facebook message

Naturally, some users are more inclined to use voice to communicate while others might like to send pictures. And the answer to “where are you” is best answered with a location. Facebook Messenger provides your chatbot with all those user-generated assets. But it is up to your chatbot to figure out their meaning (for example, speech-to-text, image recognition, OCR, mapping of latitude and longitude to addresses, and other inputs).

Note: Different messaging applications support different kinds of user-generated multimedia input, and they have different ways to send this data to chatbots. Therefore, it is very hard to create a generic cross-platform chatbot that works consistently across all messaging apps.

Rude users and writers as UI designers

When you program the chatbot, you need to anticipate all kinds of things the user might say. Sometimes, the user will intentionally be rude to the chatbot just to push the limit and see how the chatbot reacts. You should prepare a number of witty responses to handle rude comments.

In general, for all of your responses you should detect the user’s intent first, and then try for varied responses for the same intent. Nothing feels more robotic than seeing the exact response from your chatbot time and time again. Conversation management tools such as API.ai and IBM Watson Conversation or Dialog services can help you randomly select from a list of prepared answers in each scenario.

The need for witty responses and personality for chatbots has given rise to the “writer as UI designer” movement. It is reported that Silicon Valley companies are hiring English majors and poets to improve their nascent chatbots.

Case study: Lessons from WeChat

WeChat is the largest messaging platform in China, with over 700 million monthly active users. In 2013, it pioneered an extremely successful bot program called “public accounts.” What has (or has not) worked on WeChat in terms of chatbot interactions?

WeChat currently supports two types of public account bots: subscription and service. Both of them started off as chatbots, but also slowly evolved to focus less on “artificial intelligence chat.” Both types of public accounts support the CUI.

Subscription accounts are used by content publishers to message its subscribers about new content. Typically, once a day the publisher sends a list of today’s new content articles to all of its subscribers.

The user can also text back to get specific articles or perform specific actions, for example:

  • An article might say “text 42 to download the PowerPoint slides mentioned in the article.”
  • The user might text “toc” to get a list of recently published articles.
  • The user might text “contact” to get a web link to contact the account managers.

Notice that those are all very simple commands or trigger keywords. They are designed to lead you to web pages or downloads quickly. Very few subscription accounts, if any, attempt to have lengthy natural language conversation with the user.

Service accounts are used by customer service organizations to interact with their customers on WeChat. Examples include airlines, hotels, or e-commerce shops. Early on, WeChat recognized that AI automated chatbots are not capable of handling human customer support. Therefore, much like Facebook pages, each service account can get one or more designated human user accounts as “support humans.”

The screen captures show an actual customer service chat session from an airline and a credit card company. Although the text is Chinese, you can imagine what’s going on there. Again, this is simple trigger words and commands.

Screen shots of customer service chat windows

For some customer services accounts, the account owners do try to carry longer conversations with users. Yet still, the conversation is only automated at the start, and after the bot figures out the user’s intent, the chat is quickly passed on to either a relevant web page (for example, to rebook a plane ticket) or to the human agent associated with the account.

The key take-away points from WeChat’s highly evolved bot ecosystem are:

  • Different bot applications require different styles of conversational UI. For many bots, simple commands or trigger words that lead to a web view will suffice.
  • Longer bot conversations are suited for highly scripted use cases, such as customer support for specific products. Even then, it is wise to transfer the user to a web app or human agent as needed.
  • The value of messaging bots often lies in the integration with the platform so that the bot can seamlessly access the user’s identity, payment information, and other information.

Summary

In this tutorial, I discussed key issues facing conversational design in chatbots. I listed ways to make your chatbots successful, and those points were illustrated by looking at the WeChat messaging platform. I suggested three types of interaction: command and control, slot filling, and interactive conversation. I also touched on multimodal user inputs and how to creatively provide responses.