An introduction to Watson natural language processing

This article is part of the Get started with natural language processing learning path.

Level Topic Type
100 An introduction to Watson natural language processing Article
101 Look deeper into the Syntax API feature within Watson Natural Language Understanding Article
201 Visualize unstructured data using Watson Natural Language Understanding Code pattern
301 Discover hidden Facebook usage insights Code pattern

Introduction

Image showing importing and analyzing documents

As shown in the previous demo, with IBM® Watson™ natural language processing features, you can efficiently analyze and parse large amounts of text input to produce actionable insights. Give Watson a URL or a popular news site, and Watson is able to ingest text from the site and analyze it within seconds, much faster than a human. The text is analyzed by categories, concepts, emotions, entities, relations, sentiments, and more, all of which you can customize. The information extracted from this service enables you to find more meaning in text, understand trends, and recommend similar content from large amounts of data.

This article explains how IBM Watson can help you use natural language processing services to develop increasingly smart applications. While this article focuses mostly on the Watson Natural Language Understanding and Watson Knowledge Studio services, you can learn about Watson Discovery from the Watson Discovery learning path.

Take a look at a quick demo to see the abilities of the Watson Natural Language Understanding service.

Watson Natural Understanding-specific terms

Natural language processing is a subfield of linguistics, computer science, and artificial intelligence that studies the interaction between computers and human languages, and more specifically how to program computers to process and analyze large amounts of natural language data. This section explains terms that are specific to Watson Natural Language Understanding.

Natural Language Understanding: A subtopic of natural language processing in AI that deals with the way machines interpret text and understand meanings based on the context.

Features: The possible classifications that Watson Natural Language Understanding can output from the given text input. These include categories, concepts, emotions, keywords, metadata, relations, semantic roles, and sentiment.

Entities: People, companies, location, and classifications that are made by Watson. A full list of entity types and subtypes can be found in the Natural Language Understanding documentation.

Categories: Five levels of hierarchies that Watson can identify from the input text. The following table gives an example for a “fax machine.”

Level 1 Level 2 Level 3 Level 4 Level 5
technology and computing hardware computer peripherals printers, copiers, and fax fax machines

Concepts: High-level concepts that are not necessarily directly referenced in the text.

Metadata: A set of data that describes and gives information about other data. For example, for a given input URL, the metadata fields can include publication date, title, and author.

Relations: The recognition of when two entities are related, and identification of the type of relation.

Watson Knowledge Studio

Financial analysis gif

After you understand the basics of Watson Natural Language Understanding, you see that the power of Watson lies in the accuracy of its results. Watson is great out of the box, but is not trained on every industry’s domain-specific knowledge. Therefore, you might need to teach Watson specific linguistic nuances, relations, and meanings specific to your industry. For example, if you are working with a start-up that is assessing credit risk, you not only need to teach Watson the name of your start-up so that it recognizes it as an entity type of “Organization,” but also the nuances of financial terminology such as credit, crash, and interest. Each of these words might have different meanings based on their context, and the more annotations you can train Watson with, the more accurate your model and your results will be.

The following figure gives a visual representation of the steps needed to train Watson on your domain knowledge using Watson Knowledge Studio.

Steps needed to train Watson

Watson Knowledge Studio-specific terms

Annotation: The process of selecting specific text from a document and associating it with predefined entities. For example, “IBM” would be associated with the entity type “Organization,” as shown in the following figure.

Annotation

Annotation set: Documents that you assign to a human to annotate.

Watson Knowledge Studio: A tool that lets you build a machine learning model to identify entities and relationships without any coding expertise. It lets you annotate documents and train Watson in your domain knowledge.

Ground truth: The set of vetted data consisting of annotations added by human annotators that is used to adapt a machine learning model to a particular domain.

Machine learning model: A component that identifies entities and entity relationships according to a statistical model that is based on ground truth. The model applies past experience, such as training data, to determine or predict the correct outcome of future experiences based on characteristics of the data. These past experiences are captured in the form of a model by calculating feature scores for each candidate answer or evidence and combining that with known outcomes. This is sometimes referred to as a machine learning annotator.

Train model: The process of setting up a Watson instance with components that enable the system to function in a particular domain (for example, corpus content, training data that generates machine learning models, programmatic algorithms, or other ground truth components) and then making improvements and updates to these components based on accuracy analysis.

Use cases

Various use cases for natural language processing include:

  • Automated summarization for early identification of trends: Summarize news articles or social media to extract information about a specific product or company and identify trends in customer sentiment.

  • Virtual assistant chatbot such as Watson Assistant, Siri, Alexa, or Google: These popular assistants can convert audio to text, and then analyze that text using natural language processing to understand context. They can answer your questions in real time. Watson natural language processing can enrich customer input by understanding your domain language and adding sentiment and emotion analysis.

  • Email auto-complete: When you see that your email application is able to predict the end of your sentence, the application is extracting context as you type, and is able to understand the likelihood of words that follow the preceding words. Watson Natural Language Understanding features that let you quickly extract context are good for helping you to implement auto-complete or auto-correct features.

Accessing IBM Watson natural language processing

There are various ways that you can access Watson natural language processing services to improve the way you do business. Here are a few:

Watson Assistant and Watson natural language processing services are closely related and often used together. You can try Watson Assistant and build a chatbot for free on IBM Cloud.

This informational video gives you an overview of the features of IBM Watson Natural Language Understanding.

When you are ready to create your own custom model, this video takes you through the Watson Knowledge Studio features and teaches you how to annotate, train, and deploy a machine learning model to improve the accuracy of Watson Natural Language Understanding and Watson Discovery.

There’s a learning path on Watson Discovery if you want to learn more about this service.

SDKs

For programmatic access, the Watson SDKs provide support for a large number of languages. The following list shows the current developer SDKs.

APIs

For more information on the APIs, see the Discovery API documentation.

Code sample

The following Node.js code sample shows how to authorize and analyze text for emotion, sentiment, entities, and keywords.

At the top of the file, we import the Watson Natural Language Understanding Node.js SDK and an Authentication module to ensure that we have access to the Watson service from IBM Cloud.

Next, we pass in our api-key, which is generated on IBM Cloud when you create the Watson Natural Language Understanding service to create an instance in our Node.js code. We also pass in the instance URL, which is also found in the credentials tab of the IBM Cloud Service, as shown in the following figure.

Adding credentials

We know that Watson can parse a website, so we pass in the url www.nytimes.com. We pass in the entities and keywords features and tell Watson that we only want the top two keyword and entity results by setting the limit to 2. We also enable the emotion and sentiment features so Watson will output that too.

Finally, we use the analyze function of the Watson service to parse the nytimes.com website and output the top two keywords and entities that were found.

Analyze output function

Go ahead and take the following code, open a code editor, create a new file named nlu.js, and then run npm install in the directory that contains that file. This installs the ibm-watson dependencies, which let you access the Watson Natural Language Understanding API. Next, enter your apiKey and url in the first few lines of the function.


const NaturalLanguageUnderstandingV1 = require('ibm-watson/natural-language-understanding/v1');
const { IamAuthenticator } = require('ibm-watson/auth');


async function analyze() {


  const naturalLanguageUnderstanding = new NaturalLanguageUnderstandingV1({
    version: '2019-07-12',
    authenticator: new IamAuthenticator({
      apikey: '<apiKey>',
    }),
    url: 'https://api.us-south.natural-language-understanding.watson.cloud.ibm.com/instances/<instanceId>',
  });




  const analyzeParams = {
    'url': 'www.nytimes.com',
    'features': {
      'entities': {
        'emotion': true,
        'sentiment': true,
        'limit': 2,
      },
      'keywords': {
        'emotion': true,
        'sentiment': true,
        'limit': 2,
      },
    },
  };




  naturalLanguageUnderstanding.analyze(analyzeParams)
    .then(analysisResults => {
      console.log(JSON.stringify(analysisResults, null, 2));
    })
    .catch(err => {
      console.log('error:', err);
    });
}


analyze();

Go ahead and run the code, node nlu.js. The following demo shows a sample output of running the script.

Sample output

The previous code shows you the bare minimum to get you up and running with Watson. To see more advanced uses of Watson Natural Language Processing, see our IBM Developer code patterns.

Summary

This article provided an overview of Watson natural language processing services. It explained how IBM Watson can help you use natural language processing services to develop increasingly smart applications, focusing on using the Watson Natural Language Understanding and Watson Knowledge Studio services.

The article is part of the Get started with natural language processing learning path, which guides you through building fully featured web apps that are built with Watson natural language processing services. To continue, look deeper into the Syntax API feature within Watson Natural Language Understanding.