Watson Discovery can extract insights from natural language text such as customer feedback in online reviews. And with a custom annotation model, you can extract more targeted insights from your dataset. In the “Get customer insights from product reviews” code pattern, which showcases insight extraction from a public dataset of reviews of fine foods, you can see the benefit of a custom model with a simple type system. In this post, I explain how I developed the custom model for this use case.
Watson Discovery is a cloud-native insight engine that lets you rapidly build applications to explore and gather hidden insights from huge volumes of textual data written in natural language. Discovery has a powerful analytics engine to enhance the data that it processes with a generic set of enrichments that includes entities, concepts, sentiment, categories, and so on. However, if you would like to customize the extracted entities and relations to your data or your domain, then you can add a custom machine learning annotator to your Discovery pipeline. This gives you a way to slice and dice your data on those facets that are added by the custom annotator.
Watson Knowledge Studio is an IBM Cloud service that helps you build an information extraction solution that is specific to your needs and data. Subject matter experts can use it to build a custom annotator without the need for any coding or a background in data science.
Here are two samples reviews. You can see the public dataset of reviews for many more.
“Quaker Soft Baked Oatmeal Cookies with raisins are a delicious treat, great for any time of day. For example: at breakfast, I had one with a large banana and a cup of coffee and felt I’d had a relatively “healthy” start to the day. The next day at lunch, following a tuna sandwich, I had one with a glass of milk and was satisfied enough to not need a snack before dinner at 6:30. The following night, after dinner, I had one with the remainder of my glass of wine. (Delicious!) And again, didn’t feel the need to snack later in the evening.
Each cookie is individually packaged, and their texture is soft and moist, with just the right amount of sweetness. Natural flavors used in the making are Cinnamon and All Spice. These flavorings give the cookies a real old-fashioned, homemade taste. Nutritionally, the cookies have 170 calories each, 1.5g saturated fat, 150mg sodium, and 12g sugar. They also have 2g of protein and contain 25g of fiber. While the calorie count may seem a bit high for one cookie, they are good sized, and 1 cookie per serving is certainly enough to satisfy.
Because of their great taste and texture, kids will probably enjoy them also. If you like oatmeal raisin cookies, give these a try!”
- “I absolutely LOVE the BBQ popchips. They are my favorite but when I saw they had Chili Lime I thought that would be even better. Not so much. I purchased an entire box of 12 of the 3oz bags and wish I would have just stuck to my BBQ.”
Create your custom learning annotator
With an out-of-the-box application, you can see enrichments for entities such as people, companies, organizations, cities, and more (see the Discovery documentation for a complete list). While these enrichments are helpful, they don’t tell us much about what information is specifically in the product reviews.
Suppose you wanted to extract information such as product, brand, or the features of products that customers like so that you can further analyze the data on these dimensions and take suitable measures. Watson Knowledge Studio allows you to develop a custom annotator that extracts the mentions of entity types and relation types you are interested in. The process of domain adaptation is iterative in nature and involves the following steps, all of which can be done using Watson Knowledge Studio:
Step 1: Define your type system
Review the documents and decide the type system. That is, define the entity types and relation types to be extracted from the documents. For this example, we decided to extract the following from the reviews:
- Entity Types:
- Product (for example, cookies, pop chips, and coffee)
- Brand (for example, Starbucks and Quaker)
- Attribute (for example, flavors such as BBQ, “oatmeal raisin” for cookies, and “Vanilla” for coffee)
Note: For this example, we confined ourselves only to the labeled features of a product and not to the opinions of the reviewers themselves.
- Relation types:
- brandOf between entity types Brand and Product, respectively
- propertyOf between entity types Attribute and Product, respectively
Step 2: Upload sample documents
Identify sample documents to present examples of the entity types and relation types that you have defined for the machine learning annotator to train using.
Step 3: Annotate sample documents
This step involves annotating the occurrences of entity and relation types defined in the type system in all of your sample documents. This is a manual process and can be done by anyone who understands the dataset using the simple WKS graphical interface .
Step 4: Train and test the annotator
After you have completed annotating your sample documents you can build the annotator. Watson Knowledge Studio provides performance statistics that can then be used to evaluate and improve the annotator.
Deploy your custom annotator to Watson Discovery
After you’ve built the annotator and the performance is satisfactory, it can be deployed to your Discovery instance from the WKS interface. Deploying the custom annotator to your Watson Discovery instance provides a model ID that will be required for configuring your Discovery collection. Then, when documents are ingested, you will see that the custom entities (Product, Brand, and Attribute in this example) and custom relations (brandOf and propertyOf) are additional enrichments that Discovery adds to the documents. This allows you to analyze your dataset for these dimensions as well.
Benefits of using a custom model with Discovery
Finally, what additional capabilities does a custom model provide you with? Here are some examples of the analyses that you can perform with a custom model that you cannot perform with the “out-of-the-box” Discovery service (this is possible because we have extracted custom “product, brand, and attribute” entities and their associated relations):
- Most reviewed products
- Top brands for a particular product
- The positive and negative sentiments for a particular product or brand
- Most reviewed flavors for a particular product
- Any trends or anomalies detected by products or brands
This custom model was created by annotating 150 documents over 1 day by 3 annotators. The model can be improved further by examining the performance statistics provided by WKS and taking suitable steps such as adding more training documents or improving the consistency of annotations and re-training the model.