IBM Watson Explorer helps organizations scale expertise across the organizations by providing users access to the most relevant information. The cognitive assistant in new Watson Explorer Deep Analytics Edition empowers knowledge workers to focus on leveraging insights from data and save valuable time spent on finding information from multiple data sources.

This blog explains how the cognitive assistant scenario is realized with the Watson Explorer Deep Analytics Edition’s machine learning-based capabilities.

Background

In most organizations, typically a knowledge worker reads a document, evaluates its content, and decides on the next action. Most of the knowledge workers spend a significant number of hours hunting for the right information. Delivering the relevant information in minutes will help them focus on their core focus area and transform business processes.

For example, Laura is a customer service officer at a food retail company. She receives various voice-of-customers (VoC) comments such as questions or complaints. Some VoC is critical for the company. If a customer complained that “I found some hair inside the bag.”, she should contact the store manager to prevent the re-occurrence of the issue. Moreover, reviewing the VoC will need many man-hours. If the volume of incoming VoC data is huge, Laura will quickly be overwhelmed. Also, how to react to the incoming VoC depends on the staff’s knowledge and experience. It will be a huge challenge to keep the customer experience consistent.

Figure 1. Laura needs some helps
Figure 1:  Laura needs some helps

Watson Explorer Deep Analytics Edition introduces two machine learning-based capabilities which assist the knowledge workers and help them become more efficient.

Document Classification

First, document classification classifies incoming VoC data into possible pre-defined actions such as “contact client” or “give feedback to the store”. Laura uses the classification result as guidance to make the final decision. Classification should reduce the time to process VoC data. Consequently, the decisions become more consistent, compared to the past.

Search Relevancy

Second, machine learning-based search relevancy tuning capability searches documents which are similar to the incoming document. Laura refers to those documents to support her decisions.

Figure 2: Watson Explorer DAE advices to reduce cognitive burden
Figure 2:  Watson Explorer DAE advices to reduce cognitive burden

 

Document Classification with Watson Explorer Deep Analytics

The following section describes how to configure and try the document classification capability with Watson Explorer Deep Analytics Edition step-by-step.

Data

Suppose the company has the historic data of VoC engagements (Figure 3). Each record contains the product information (claim_product_line, claim_product) and the customer information (client_segment, client_location, client_sex, client_age) and the content (body). Moreover, the past decisions are stored as “label”. This contains “the category of problem” and “action”. For example, Laura categorized the VoC which stated “The straw was peeled off from the juice pack” to “package_container”. Also, she put “feedback_store” to the VoC because she gave a feedback to the store to ask checking another juice packs. Also, some records have “contact_client” in a label for several reasons. For example, contamination is a serious problem and the company needs to contact the client for the reimbursement or replacement. Moreover, if the client was “Golden Card Member (loyal customer)”, the company needed to provide special treatment to the client.

Figure 3. Document Classifier Training Data
Figure 3:  Document Classifier Training Data

Watson Explorer Document Classification is supervised multi-labeling. Thus, a document can be classified in more than one class. For the training data, the expected values should be represented as JSON array.

 

Create Document Classifiers

Document classification model can be created on Admin GUI or Content Miner Application. Here are the steps to create a document classification model on Admin GUI. As a training data, use attached “voc_label_training.csv” (Rename file for use as .csv)

  1. Open Admin GUI
  2. Click “Resources” tab
  3. Click “Add classifier” button
  4. On “Create Classifier” page, input “VoC Classifier” as the name. Select “Supervised Multi-labeling” for “Classifier Type”. Click “Next”
  5. On “Add a dataset to your collection” page, upload “voc_label_training.csv”
  6. On “Configure CSV parser” window, confirm “UTF-8” is selected for “Character set of the csv file”. Also, check “Comma” for “Delimiter of the columns”. Check “Use header”. Click “Next”
  7. On “Select columns to import”, select “Date” as “Type” for “date” column. Click “Next”
  8. On “Import your files” window, click “Start import now” and wait until “Status” is “Completed”. “Record count” becomes “200”. Click “Save”
  9. On “Supervised Multi-labeling Setting” page, select “label” for “Answer Field”. For “Predicted Field”, “label_predicted” is filled. For “Collection Template”, “label_classifier_template” is filled. Click “Next”
  10. On “Configure collection fields”, select “body” for “Body field”. Select “Document URI” for “Title field”. Select “date” for “Date field”. Click “Next”
  11. On “Enrich your collection” page, confirm “Part of Speech” is checked for “Annotators”. Click “Next”
  12. On “Specify the facets for analysis” page, use the default setting. Click “Nexr”
  13. On “Confirm” page, click “Save”
  14. “VoC Classifier” detail page is opened. Click “New model”
  15. “Create model and start training” window is opened. Click “Divide database by ratios”. Click “Create”. Wait until a model is created.
  16. Click “Deploy” button.

 

Create Collection

To use the document classifier, you need a collection. Use “voc_nolabel.csv” as a test data to create a collection. (Rename file for use as .csv) This data does not include “label” information, but the document classifier adds “label” as its classification result during creation of a text index.

  1. Open Admin GUI
  2. Click “Add collection”
  3. On “Collection Template” page, select “label_classifier_template” Click “Next”
  4. On “Create Collection” page, input “VoC Classifier” as a name. Click “Next”
  5. On “Add a dataset to your collection”, upload “voc_nolabel.csv”
  6. On “Configure CSV parser” window, confirm “UTF-8” is selected for “Character set of the csv file”. Also, check “Comma” for “Delimiter of the columns”. Check “Use header”. Click “Next”
  7. On “Select columns to import”, select “Date” as “Type” for “date” column. Click “Next”
  8. On “Import your files” window, click “Start import now” and wait until “Status” is “Completed”. “Record count” becomes “463”. Click “Save”
  9. On “Configure collection fields”, select “body” for “Body field”. Select “Document URI” for “Title field”. Select “date” for “Date field”. Click “Next”
  10. On “Enrich your collection” page, confirm “Part of Speech” is checked for “Annotators”. Also, confirm “VoC Classifier” is selected for “Classifiers”. Click “Next”
  11. On “Specify the facets for analysis” page, use the default setting. Click “Next”
  12. On “Confirm” page, click “Save”
  13. Indexing is started. Wait until “Indexing status” becomes “Finished”

 

Use Document Classification

The created model can be used as one of annotators. Therefore, document classification result can be used as a facet for content mining. The result of document classification is displayed with each document as colored tags (Figure 4). As you can see, some documents have more than one tag.

Figure 4: The result of document classification on Content Miner
Figure 4:  The result of document classification on Content Miner

Watson Explorer is based upon REST APIs. Real-time Natural Language Processing (NLP) is a powerful API which analyzes a document with the annotators on-the-fly. The result of document classification is included in this real-time NLP result. Watson Explorer provides the interface to test REST APIs.

  1. Open https://<your WEX host>/docs/
  2. Click “Authorize” button. “Available Authorization” window appears. Input your user id and password. Then, click “Authorize”
  3. Find “Collection” section. Click “[GET] /api/v1/collections”
  4. Click “Try it out” then “Execute” button appear. Click the “Execute” button
  5. A list of collections is returned. Find “id” of “VoC Classifier” collection. (Figure 5)
Figure 5 Result of “list collection”
Figure 5:  Result of “list collection”
  1. Find “NLP” section. Click “[POST] /api/v1/collections/{collectionId}/analyze
  2. Click “Try it out”
  3. Input the collection id for “collectionId”.
  4. Input the following JSON as “document” (Figure 6)
{

“fields”: {

“claim_product_line”: “Tea”,

“claim_product”: “lemon tea”,

“client_segment”: “Not Member”,

“client_location”: “Manhattan”,

“client_sex”: “Male”,

“client_age”: “20”,

“body”: “I found some hair inside the bag.”

},

“metadata”: {

 

}

}

Figure 6
Figure 6
  1. Click “Execute” button
  2. The result of real time NLP is returned. It contains the result of document classification with its probability. In this example, “feedback_maker” and “contact_client” and “contamination_tampering” are the result.
    Figure 7
    Figure 7
  3. Let’s try another example
{

“fields”: {

“claim_product_line”: “Tea”,

“claim_product”: “lemon tea”,

“client_segment”: “Not Member”,

“client_location”: “Manhattan”,

“client_sex”: “Male”,

“client_age”: “40”,

“body”: “The straw was peeled off from the juice pack.”

},

“metadata”: {

 

}

}

  1. The result is the following. It contains “feedback_store” and “package_container”.
Figure 8
Figure 8
  1. Moreover, try another example. In this case, the “client_segment” is “Golden Card Member” instead of “Not Member”. The other fields are the same.
{

“fields”: {

“claim_product_line”: “Tea”,

“claim_product”: “lemon tea”,

“client_segment”: “Golden Card Member”,

“client_location”: “Manhattan”,

“client_sex”: “Male”,

“client_age”: “40”,

“body”: “The straw was peeled off from the juice pack.”

},

“metadata”: {

 

}

}

 

The result of real-time NLP is returned. In this instance, in addition to “feedback_store” and “package_container”, “contact_client” is returned. Thus, you can see the metadata is also taken into account for the document classification.

Figure 9
Figure 9

Summary

Watson Explorer Deep Analytics Edition introduces new machine learning and cognitive advice capabilities. These capabilities have already been successfully tested with various businesses across the globe. For example, a Japanese insurance company achieved 90% accuracy in coding medical terms and treatments during claim assessment, increased claim processing efficiency by 30% and reducing mistakenly unpaid claims by 20% by deploying Watson Explorer. The document classification capability is also available in Watson Explorer Community Edition free trial.

Download Watson Explorer free trial to experience the New Watson Explorer

3 comments on"Create actionable insights with IBM Watson Explorer Deep Analytics edition"

  1. Thnaks Yutaka Moriya. Article is very helpful. Could you please write similar article explaining usage of annotators and rankers.

  2. Thank you for your feedback.
    This is an article about Ranker by the Watson Explorer development team.

    Do you still rely on keyword search? Find similar documents easily with Watson Explorer’s machine learning powered Ranker
    https://developer.ibm.com/dwblog/2018/watson-keyword-search-machine-learning-ranker/

    We will continue to write articles to introduce WEX advanced features.

    • Thanks a lot !! 🙂 But the other article is not as friendly as yours. Will try it out.Waiting to see more articles from your side.

Join The Discussion

Your email address will not be published. Required fields are marked *