In this post we’ll take a look at how metadata, category and concept enrichments offered by Natural Language Understanding can extract meaning from unstructured data. You can read my previous two posts on enrichments here (sentiment and emotion) and here (entities, relations, and semantic roles).

If you’d like to try out a text or article analysis, you can insert a url or cut and paste text into the online demo found here.

Metadata

Natural Language Understanding’s Metadata feature helps gather structured information, and organize your data from large web crawls. The structured information we collect from articles includes the publication date, title extraction, author extraction and RSS or ATOM feeds.

Let’s take a look at the results for the Metadata on this article.

{
"retrieved_url": "https://www.ibm.com/blogs/watson/2017/06/tribeca-film-festival-inspiring-digital-storytelling-with-watson/",
"metadata": {
"title": "Teaming up with Tribeca Film Festival to inspire digital storytelling with Watson - Watson",
"publication_date": "2017-06-28T00:00:00",
"image": "https://www.ibm.com/blogs/watson/wp-content/uploads/2017/06/Tribeca-Banner-Blog.jpg",


"authors": [
{
"name": "Rob High"
}
]
},
"language": "en"
}

Metadata from the article, such as title, author, feeds and publication date is extracted from the web page. RSS feeds come with a confidence flag, which is set to “NO” when the system isn’t certain if an RSS or ATOM feed exists. This dampen the ‘noise’ that might come from spurious results.You can use the metadata feature to:

  • Collect information about the story in one place through RSS feeds
  • Aggregate articles by a certain author
  • Organize your information using publication dates and titles

Concepts

Natural Language Understanding’s Concepts feature helps you grasp important concepts that are explicitly mentioned or implied in a given document on a scale of 0 to 1.

The system picks Concepts, mapped to DBpedia resources, which can provide more information on related topics than what’s in the article.

The Concepts feature is useful as the basis of a recommendation engine, or as a jumping off point for further research about related topics.

Let’s take a look at the results for the Concepts when applied to the article:

{

"concepts": [

{

"text": "Choreography",

"relevance": 0.957914,

"dbpedia_resource": "http://dbpedia.org/resource/Choreography"

},

{

"text": "Tribeca Film Festival",

"relevance": 0.779128,

"dbpedia_resource": "http://dbpedia.org/resource/Tribeca_Film_Festival"

},

{

"text": "Dance",

"relevance": 0.744608,

"dbpedia_resource": "http://dbpedia.org/resource/Dance"

},

{

"text": "Virtual reality",

"relevance": 0.743941,

"dbpedia_resource": "http://dbpedia.org/resource/Virtual_reality"

},

{

"text": "Film",

"relevance": 0.621209,

"dbpedia_resource": "http://dbpedia.org/resource/Film"

},

{

"text": "Film festival",

"relevance": 0.615689,

"dbpedia_resource": "http://dbpedia.org/resource/Film_festival"

},

{

"text": "TriBeCa",

"relevance": 0.61553,

"dbpedia_resource": "http://dbpedia.org/resource/TriBeCa"

},

{

"text": "Entertainment",

"relevance": 0.570181,

"dbpedia_resource": "http://dbpedia.org/resource/Entertainment"

}

]

}

Concepts such as choreography, dance, film festival, and entertainment capture relevant information in the article – even though they’re not the specific topic of the . These can be used to recommend articles related to similar concepts that the user might find interesting, or to collect articles related to the concept “film festival”. Other use cases include:

The Concepts feature gives the concepts a relevance score to indicate how relevant they are to the input document. The user can then access DBPedia for more information…

Categories

Natural Language Understanding’s Categories feature can supply a sense of where in the domain’s hierarchy the article categories reside. This can help you build systems to sort articles and track topics of interest. You can also use it for finding user demographics, predictive advertising, and organizing news feeds.

Let’s take a look at the results for the Categories on the article:

{ "categories": [  {   "score": 0.875743,   "label": "/art and entertainment/shows and events/festival"  },  {   "score": 0.648747,   "label": "/technology and computing"  },  {   "score": 0.467648,   "label": "/art and entertainment/movies/film festivals and awards"  } ]}

Categories for the article are art and entertainment/shows and events/festival and technology and computing, then movies/film festivals and awards. This hierarchy gives a proper sense of where the article fits in a global scope.

  • Awards – matches the article content; very fine-grained
  • Arts and entertainment – also gets high-level categories

Categories also gives a score from 0 to 1, indicating the confidence of the categories related to the article.

Try out your own text or article analysis at https://natural-language-understanding-demo.mybluemix.net/

Learn more about Natural Language Understanding

Join The Discussion

Your email address will not be published. Required fields are marked *