IBM Watson Natural Language Understanding (NLU) is a cloud-based Watson service that enables developers to easily and quickly extract and analyze metadata from unstructured text.
In this post, we’ll look at how we can customize data from IBM Watson Natural Language Understanding by adding human annotations using Watson Knowledge Studio, and how that impacts our results.
If you haven’t already, sign up for free access to Watson Knowledge Studio.
I’ve taken the text from this post: https://www.ibm.com/blogs/think/author/aleksandramoj/
And saved it as a txt file. I’ll use this as the baseline of data to annotate for a new entity which I’ll name Social Issues.
After uploading my sample data (the blog post), I’m going to teach Watson how to identify entities called “social issues.” But, I have to initially tell it what I mean by “social issues.”
I’ll use the Human Annotation tool to begin this process. That means I’ll act as the trainer that tells the system what words need to be tagged as social issues in the document.
Once we annotate this information with what terms in the document are relevant to social issues, it will allow us to use NLU to recognize the exact terms, and should be able to identify associated “social issues” terms within other documents.
We’ll test that theory in a moment.
Now I am ready to annotate:
Select the entity you want to train, which for us is Social_issues.
Now I am ready to tag items that I think refer to social issues. After I’ve tagged all of the content that I think is relevant to social issues, the text looks like this:
Since I’ve finished tagging my terms, I’m going to accept the changes by saving the file, closing it and adding ‘accept’ to its status. This means I’m satisfied with the way the system has been trained and it is now considered the ‘ground truth’.
I’ll then move to the annotator component tab and select machine learning.
Since this is a small data set of one text file and we have no test set or blind set, I’ll select the file and set it to 100%
I then selected Next -> Train and let Watson NLU do its thing:
Now, since I trained Watson on the terms I thought were social issues, let’s see how it learned. I’m using a new blog topic, https://www.ibm.com/blogs/think/2016/12/watson-cancer-care/. I applied my model ID that deployed my custom entities.
Even though there was only a limited sample as shown above, when I tested the new URL in Bluemix using entities and my custom entity model, I retrieved the following:
"text": "cancer care",
"text": "medical practice",
"text": "treatment possible",
"text": "treatment for",
Watson NLU identified “cancer care”, “treatment possible”, “medical practice” and “treatment for” as social issues even though none of those terms were tagged in the original training!
So, even with a small data set (one blog post), you can see the power of teaching Watson the language of your domain for advanced text analytics.