Current natural language processing techniques cannot extract/interpret the data as required by domain/industry specific. The data(entities) represent different meaning in different domain. Best answer to such problem is IBM Watson Knowledge Studio.

Consider an example where we need to extract entities present in commercial sms.

In such commercial sms usually interesting entities to be extracted are

    1. what is the offer
    2. who is providing(merchant)
    3. offer name(if present in sms)
    4. offer validity period

The secondary info like merchant phone, website link etc. may also be captured if required. Consider the sample SMS text to be analyzed is as below

“DUNKI DONUTS is now open at Girgaum Chowpatty. spl offer on your favorite Donuts. Buy 3 & Get 3 FREE. Valid till 15 Feb 2017. T&C”

Let us first use Natural Language Understanding api to extract the entities from sms.


<curl -u "59c0fafc-8406-4c11-b5bf-6e29578a4346":"UuVqNnQDcCbE" "https://gateway.watsonplatform.net/natural-language-understanding/api/v1/analyze?version=2017-02-27&text=DUNKI%20DONUTS%20is%20now%20open%20at%20Girgaum%20Chowpatty.%20Walk-in%20and%20enjoy%20the%20Valentaine%20SPL%20offer%20on%20your%20favorite%20Donuts.%20Buy%203%20%26%20Get%203%20FREE.%20Valid%20till%2015%20Feb%202017.%20T%26C&features=entities">

<Output:{ "language": "en", "entities": [ { "type": "Company", "text": "DUNKI DONUTS", "relevance": 0.976076, "count": 1 }, { "type": "GeographicFeature", "text": "Girgaum Chowpatty", "relevance": 0.65276, "count": 1 } ] }>

The api is able to capture company(merchant) and location which are generic entities. It fails to extract the offer details as per our expectation.

To get domain specific entities we need to look alternate to Natural Language Understanding. As mentioned in beginning the solution to such business requirement is Watson Knowledge Studio.

Let us see what will be the output using NLU api with wks model. I will explain the process of building the model and integrating with NLU later.


<curl -u "59c0fafc-8406-4c11-b5bf-6e29578a4346":"UuVqNnQDcCbE" "https://gateway.watsonplatform.net/natural-language-understanding/api/v1/analyze?version=2017-02-27&text=DUNKI%20DONUTS%20is%20now%20open%20at%20Girgaum%20Chowpatty.%20spl%20offer%20on%20your%20favorite%20Donuts.%20Buy%203%20%26%20Get%203%20FREE.%20Valid%20till%2015%20Feb%202017.%20T%26C&features=entities&entities.model=10:8a91f680-4eb0-4c7b-b37e-193bb124bc18">
<Output:{ "language": "en", "entities": [ { "type": "Merchant", "text": "DUNKI DONUTS", "count": 1 }, { "type": "Location", "text": "Girgaum", "count": 1 }, { "type": "Offer", "text": "Get 3 FREE", "count": 1 }, { "type": "Offer_Period", "text": "Valid till 15 Feb 2017", "count": 1 }, { "type": "Term_and_Conditions", "text": "T&C", "count": 1 } ] }>

If we look the entities extracted in the form of json, we got the domain specific entities like offer, offer period, merchant. The model I used is trained and evaluated based on few sample sms.

Once you build WKS model and NLU api service you can replace username/password highlighted with your NLU service credentials. Replace WKS model id(entities.model) with your WKS model id. The sms text has to be URL encoded as it is passed as url query string in Curl command.

Now I will brief on how to use Watson Knowledge Studio to develop machine learning model for domain or use case specific. This model will be integrated with Natural Language Understanding to extract the domain specific entities.

WKS

    1. We build Type System specific to business domain/use case
    2. We create Dictionary which will be used as pre-annotator. It acts as boostrapper for training process.
    3. We follow human annotation process to annotate entities and relationship.
    4. We create machine learning model and train the model till we are confident with model. The machine learning model is built without writing any code.
    5. The corpus document from document tab can be exported which can be imported into new wks project if required.
    6. The machine learning model will be integrated into Natural Language Understanding service.

    7. NLU

    8. Using Natural Language Understanding service we can extract the entities from new documents(sms).

This use case can be found under github repository as mentioned in below link.
https://github.com/ragudiko/wks-nlu-sms-analysis

You can use simple java client provided in github repository instead of curl command to extract entities.
You can register at https://console.bluemix.net/ to access NLU api.

You can try free version of WKS : https://www.ibm.com/in-en/marketplace/supervised-machine-learning

Join The Discussion

Your email address will not be published. Required fields are marked *