Watson Natural Language Understanding extracts people, place, organizations and relationships from text to help you identify who, where and what is being talked about so that you can build knowledge bases and guide question-answering tasks. In this second of a three-post series, we’ll take a deep dive into showing you how you can use these enrichments.
If you’d like to try out a text or article analysis, you can insert a url or cut and paste text into the interactive demo found here.
In my first post “Is Artificial Intelligence Scary?” I covered how to identify emotions, sentiments and keywords in unstructured text. In my post “How to get the most from text enrichments in Watson Natural Language Understanding” I covered metadata, concepts, and categories. Today we’re looking at people, places, organizations and relationships.
Entities, Relations, and Semantic Roles for NLU
For identifying specific people, places, and organizations, and what relationships they have with other things, Natural Language Understanding provides a set of: Identifying Entities, Relations and Semantic Roles.
These features are great for answering who, what, and where we’re talking about. Let’s take a look at what we learn when these engines are applied to this article “Science for Social Good.”
Natural Language Understanding – Entities
Natural Language Understanding Entities are used to extract specific names of people, locations and organizations from text. Unlike the Keywords feature discussed in my first NLU post, which identifies generally interesting phrases from the article, Entities have specific types associated with them. And represent specific things that people have names for. From Entities, NLU can detect that IBM is a company, the United States is a location, and as we’ll see below, that Thomas J. Watson, Sr. was a person.
The disambiguation feature will provide more information about the entity than can be found in the text. When there is information available, the Entities response will return a “disambiguation” section that links directly to the DBpedia page about the person, place or thing, as well as a canonical name for the entity.
When possible, the Entities feature will augment supertypes (such as company, person, job_title, organization, healthcondition) with subtypes, which can further enhance the understanding of the content as in the instance below where we see that CompanyFounder is a subtype of the supertype Person (Thomas J. Watson Sr.).
“text”: “Thomas J. Watson Sr.”,
“name”: “Thomas J. Watson”,
Natural Language Understanding – Relations
The Relations feature of Natural Language Understanding has a pre-defined set of relationships between different entity types. You can also use custom models developed with Watson Knowledge Studio to define typed entity and relations that are specific to your company, organization, industry or domain – we’ll look at that in the next post.
In this instance, we find that the relations PartOfMany:
“sentence”: “In this program we will work closely with organizations (NGOs, public sector agencies, social enterprises) that are on the forefront of big societal challenges to learn and take inspiration from the problems they are tackling.”,
From this article, Relations has picked out that NGOs are part of many organizations as shown in the collective list of partners. The specificity of Relations can help map many different relationships between known and unknown entities.
Natural Language Understanding – Semantic roles
For more loosely defined relationships, Semantic Roles can help. The subject-action-object triples found by the API can describe what a certain noun is usually found doing, what sort of properties are usually ascribed to the subject, or even collect what someone has said on a certain topic. Semantic Roles can collect information about what entities are doing in articles across a corpus.
Semantic Roles focuses on a specific verb as an action, and determines which phrases in the sentence are the most likely subject and object for the action.
“text”: “IBM Health Corps”
“sentence”: ” Some of our recent initiatives include: P-Tech, where we are reinventing education by bringing together the best elements of high school, college and the professional world; Teacher Advisor, powered by IBM Watson, to support teachers in improving teaching and student achievement; The Jefferson Project, which deploys IoT and data analytics to protect fresh waters; and IBM Health Corps which helps partner organizations expand health access and improve outcomes with analytics and cognitive technologies.”,
“text”: “partner organizations expand health access and improve outcomes with analytics and cognitive technologies”
In this semantic roles sample from the article, the system has picked out that IBM Health Corps “helps” partner organizations. Gathering more free-form information like this across a corpus can help extract useful facts to draw conclusions in many domains.