Improve Watson Discovery results using API-based relevancy training

Summary

Developers use the IBM® Watson™ Discovery service to rapidly add a cognitive, search, and content analytics engine to applications. With that engine, they can identify patterns, trends, and insights from unstructured data that can drive better decision making. Sometimes, you want to improvise the search results by providing more training details. Relevance training is a feature in Watson Discovery that provides additional training for more accurate search results. This code pattern shows how you can use relevancy training APIs to improvise search results in Watson Discovery.

Description

Developers use the IBM Watson Discovery service to rapidly add a cognitive, search, and content analytics engine to applications. With that engine, they can identify patterns, trends, and insights from unstructured data that drives better decision making. With Watson Discovery, you can ingest (convert, enrich, clean, and normalize), store, and query data to extract actionable insights. To perform searches and queries, you need content that is injected and persisted in collections. You can learn more about developing applications with Watson Discovery by studying the cognitive discovery reference architecture.

Relevancy training is a powerful capability in Watson Discovery that can improve search accuracy if the right approach is taken. You can train Watson Discovery to improve the relevance of query results for your particular organization or subject area. When you provide a Watson Discovery instance with training data, the service uses machine learning Watson techniques to find signals in your content and questions. The service then reorders query results to display the most relevant results at the top. As you add more training data, the service instance becomes more accurate and sophisticated in the ordering of the results it returns.

Relevancy training is optional. If the results of your queries meet your needs, no further training is necessary. For an overview of building use cases for training, see the blog post “How to get the most out of relevancy training.”

Relevancy training in Watson Discovery can be done in two ways:

If your Watson Discovery instance has a fairly large number of questions for which relevancy training needs to be done, then the tooling method might take much longer compared to the programmatic (using APIs) method. Also, with APIs, you do not need to be online connected to the Watson Discovery instance through a browser.

This code pattern shows how relevancy training can be achieved using APIs.

Flow

Improve Discovery relevancy training flow diagram

  1. The client application sends a natural language query for each of the queries that needs relevance training.
  2. Watson Discovery returns a set of documents for each of the natural language query made.
  3. The client application saves queries and corresponding documents in a TSV file on a local machine.
  4. The user assigns relevancy scores to documents and saves the file.
  5. The application accesses the file with updated relevancy scores.
  6. The client application invokes APIs to update Watson Discovery collection training using updated relevancy scores.
  7. The client queries again to get improved results.

Instructions

Find the detailed steps for this pattern in the readme file. The steps show you how to:

  1. Create a Watson Discovery service instance on IBM Cloud.
  2. Clone the repository and get the code.
  3. Annotate your documents.
  4. Achieve relevance training for a large set of questions.