Introduction

This article shows how to use the deployed machine-learning model to enrich the insights that you get from your data to include the recognition of entities and relations that are relevant to your domain using Watson Discovery service(WDS).

As an illustration, tutorial provides step-by-step instructions on creating, developing, and deploying machine learning annotator using Watson knowledge studio(WKS) and also guide you how to Integrate a machine-learning model to Watson Discovery Bluemix service.

Model creation using Watson knowledge studio

Here our intention is not to demonstrate end to end development of a machine learning annotator which starts with curation of knowledge all the way to deployment to allow for an end to end cycle of domain adaptation. Training annotators is really a complex iterative process and it involves multiple steps.

The steps described here is to creating standards or ground truth based on which to train the machine learning model, annotator development and evaluation, and then deploying it into Watson discovery service.

Step 1: Project creation

You should be able to login to the WKS instance. If you do not have a WKS account, please create a Watson Knowledge Studio Account. You can sign up for a 30 day free trial here:

https://www.ibm.com/us-en/marketplace/supervised-machine-learning/purchase#product-header-top

In the “Create New Project” pop up window, enter the name of the new project. The “Description” field is optional. In the ”Select a language” drop down field, choose “English”. Machine learning-based tokenizer is default option you need to choose.

Step 2: Prepare document content

Depending on your business requirement, the information that you want to extract from your data set could be different. Make sure that your training documents are truly representative of content that is of interest to your domain; For example, Statement “NCR, which counts IBM founder Thomas Watson as one of its early employees” contains many relevant mentions, relationships that can be annotated differently by Human annotators. For some human annotator “Thomas Watson” as person entity type maintains “employedBy” relationship with IBM while for others “founderOf” seems to be more relevant.

Here we are considering content “NCR, which counts IBM founder Thomas Watson as one of its early employees” from source web document

http://www.gmanetwork.com/news/scitech/content/189511/tech-firm-ncr-sees-growth-for-atms/story/

Step 3: Creating type system

The set of entity types and relation types is called a type system.  In many cases you might start with an existing more generic type system and then go on to adapt that type system to your domain based on the application you want to create from the extracted information.

Analysing source content given above, we can easily think of three entity type names (ORGANIZATION,GPE and PERSON) and two relationships (employedBy and founderOf). In Watson Knowledge Studio , you can create a type system from scratch or import an existing type system. It is strongly recommended to use KLUE(Knowledge from Language Understanding and Extraction)type system. Here are the steps you need to follow for creating type system from scratch.

    a)Click Add Entity Type. Add the following Entity types:

    1. ORGANIZATION
    2. GPE
    3. PERSON

    You can also optionally define roles and subtypes for the entity type:

    • Roles help to Qualify the mention by the context in which the mention occurs.
    • An entity subtype is only one level deep.  If you want to get deeper, you would have to flatten the entity types.

    b)Add the entity type “ORGANIZATION” and hit the return key. You will see a new entry for ORGANIZATION. Similarly add the remaining entity types GPE and For PERSON.
    c)The Entity Types tab should now look like this:

    d) Add new relation types:
    – employedBy (PERSON, ORGANIZATION,GPE)
    – founderOf (GPE, ORGANIZATION)

    1. Click on the “Relation Types” tab.
    2. Click on “Add Relation Type” button.
    3. Type the relation type name as “founderOf” and choose the first entity type as GPE and the second entity type as ORGANIZATION.
    4. Similarly, add a relation type “employedBy” and choose the first entity type and the second entity type as ORGANIZATION,PERSON AND GPE.

    e)Once the relation types have been added, the Relation Types tab should look like this:

    It is recommended that ten entity types and ten relation types is good for a well‑contained application with human annotators who are not very specialized in the field. Type system sizes in between 10 and 50 are good for subject matter experts.

Step 4: Upload Documents

Comma separated value(CSV) file is one of the format which can be Imported as Document set. The CSV file must be in UTF-8 format and have two columns: (Column 1: the document filename / Column 2: the document body). Import a single CSV file at a time.

Please find the format of csv file to upload

Following steps are required:

  1. For your project, go to the “Documents” tab.
  2. Click on the icon for importing documents.
  3. In the “Add Documents to the Corpus” window, drag and drop the csv file.
  4. Click Import for the documents to be added to your project.

    Step 4.1: Creating and assigning annotation sets

    Once the documents have been uploaded, document sets need to be created so that they can be annotated by multiple human annotators. To view inter-annotator agreement scores, you must assign at least two human annotators and specify that some percentage of documents overlap between the sets.Please follow the below steps:

    1. Open the documents page for your project.
    2. Click the icon to create annotations set.
    3. Choose the overlap value, specify the percentage of documents that you want to include in each annotation set. You can add sets by clicking on the ‘+’ icon.
      • a) Select a user name from the list of human annotators.
        b) Name the annotation set.
    4. Click Generate.

      Finally, you will be ending up with following one annotation set containing one document.

      This step is important when you have huge document data sets for annotation and there are need of two or more human annotators to work upon multiple working sets. We ae using trial version of WKS so its not possible to have two human annotators working on same set.

      Step 5: Creating Dictionary

      This is optional step in building a machine learning model and it always associated with an entity Type. You can importing csv file or you can add manually with following steps:

      1. Open the Dictionary page for your project.
      2. Create an empty dictionary using add Button. Enter the dictionary name. The interface will be look like this.
      3. Now add new entry using “Add Entry” button and it will allow you to mention Surface forms(i.e Group of words and Phrases which are similar). We have identified “IBM” which has associated with “ORGANIZATION” entity type. Apply below steps for adding entry for “IBM”.
        • Mention “International Business Machine” as Input to Surface Forms tab.
        • Choose “Noun” from drop down from Part of Speech Tab and Save it.

      Step 5.1 : Creating Dictionary annotator

      Dictionary annotator needs to be created in order to Bootstrap or pre annotate documents in Training corpus. These pre annotation job simplifies the job of human annotator by pre-annotating the documents in Project. Please follow the below steps to have dictionary pre-annotator.

      1. Open the Annotator component tab.
      2. In the listed annotator components, choose dictionary annotator for pre annotation. Click “Create this type of Pre-Annotator”.
      3. In the “Create Annotator” page click edit button for “ORGANIZATION” entity type and select the dictionary that you created using step 5 and click on the “Create” drop down on the top right corner and choose “Create & Run”.
      4. After Step 3 above, next interface will allow you to choose either document set or annotator set to run. There is Important fact to know that never run a pre-annotator on documents that human annotators have annotated because the annotations added by the human annotators will be removed. Choose document set.
      5. Once run successfully it will be taking as different and complete space under Project’s “Annotator component” tab. You can click the details for more Information.
      6. Step 6: Create a Human annotation task

        An annotation task in WKS is to allow human annotators to annotate text in isolated spaces. It also make sure that single approved annotated texts/decisions are promoted to ground truth. WKS annotators are statistically trained people so they need good training data.

        Please follow the below steps to perform Human annotation task.

        1. Open the Annotator component tab.
        2. Click on “Add task” Button. Enter the title and click “create” button.
        3. At “Add annotation sets to task” interface click created annotation set and click “create task” button at the right corner.
        4. Once created successfully it will be taking as different and complete space under Project’s “Human Annotation” tab like this.
        5. Once you click to above create annotator set you will be having multiple facilities to see how consistently multiple human annotators annotated the same documents. This interface will also allow to specify inter threshold agreement (using IAA Settings tab)between human annotators. In our case single human annotation task is leveraged so there is no need to explore these options here. Please click “Annotate” button.
        6. Clicking the Annotate button will walk you through the Ground truth editor where you can specify preferences for using colors and keyboard shortcuts. We have used default colours for entity types(i.e PERSON AND ORGANIZATION).
        7. To annotate relations, go to the relations page by clicking relations on the left. Choose the mention which belongs to the first entity type and then the mention that belongs to the second entity type in a relation and finally Select the relation on the right to annotate the relation. Interface will look like this.
        8. Once you have completed annotating mentions, co-references and relations on all the documents, mark each document as ‘Complete’ and ‘save’ like above.
        9. Close the document by clicking the ‘x’ button to the right of the status drop down box to close the current document and annotate the remaining documents.
        10. Once you have annotated all the documents in a document set and marked them as complete, the document set status changes to “Submitted” from “In Progress”. Now you accept the changes so that it will promoted to Ground truth.
        11. Refresh the page you will get status as and completed.

        Step 7: Machine learning model creation

        A machine-learning annotator identifies entities and entity relationships according to a statistical model that is based on ground truth. Please follow the below steps:

        1. Open the Annotator component tab.
        2. In the listed annotator components, choose Machine Learning annotator for annotation. Click “Create this type of Annotator”.
        3. In the “Create Annotator” page click all button with by default ratio which need to be included in each set and click on the “NEXT” button the top right corner.
        4. Use by default dictionary mapping and use “Train and Evaluate” button the top right corner.
        5. Finally you will get the final result like this.

        Step 8: Deploy model to Watson Discovery service available in Bluemix

        Once you have WKS Model ready to integrate with other Watson Services in this case Watson discovery has taken into the consideration.Please ensure you have provisioned the service instance of Watson Discovery in Bluemix.

          a) Provisioning the Watson Discovery in Bluemix

        • Login at https://console.ng.bluemix.net/ and go to Catalog and use search filter with keyword “Discovery”.
        • Choose “discovery” service under Watson Service and Click “Create” and make the note of Service name and development space, later these information’s will be used when deploying model to discovery service.
        • Once created the service,please note out the service credentials of service.

        • b) Once this happens you need to click “Details” section of Machine learning model Interface at “Annotator component” tab of your Project. Interface will look like this.

          In attempt to get Model id which will be used to Integrate with other Watson services you need to follow these steps:

          1. Click “Take Snapshot” button and Give some description and then click “OK”,you will get something like this and then click “Deploy” button.
          2. Clicking deploy button will enable to deploy your WKS model to any other Watson services. We choose Discovery and do “next”.
          3. In this step you will bind the model to your Discovery service instance and you will be deploying like this using “deploy” button.
          4. After deploying, Please make note the model id from below interface and please wait as it might take a few minutes for publishing and deployment to complete and for this model to be available to your application.
          5. You can check the status of deployment by clicking “WDS” listed under “status tag” of table.
          6. It will demonstrate the completion status like this :

        Deploying a machine-learning annotator to IBM Watson Discovery

        We have considered same content “NCR, which counts IBM founder Thomas Watson as one of its early employees” from source web document http://www.gmanetwork.com/news/scitech/content/189511/tech-firm-ncr-sees-growth-for-atms/story/ and saved the content as “NCR DOC.html”.

        You can get the entire results which is based on generic curl commands.

        curl -u "{username}":"{password}" https://gateway.watsonplatform.net/discovery/api/v1/environments/{environment_id}/configurations/{configuration_id}?version=2016-12-01.

        Please find the attached PDF https://developer.ibm.com/in/wp-content/uploads/sites/115/2017/05/Curl-command-Results.pdf for CURL command Results that is used to deploy WKS model to WDS.

        Summary

        Knowledge Studio can be used by natural language experts, data scientists and subject matter experts alike.  It is also natively integrated with other Watson offerings such as Watson Explorer, Natural language understanding and Watson discovery.  It offers the advantage of sharing and reusing domain‑specific annotators across different Watson solutions.

        References

        1. Getting started with IBM Watson Knowledge studio
        2. https://www.ibm.com/watson/developercloud/doc/wks/tutorials.html

        3. Developing application that interacts with discovery service
        4. https://www.ibm.com/watson/developercloud/discovery/api/v1/#preview

        5. Technical issue related to WKS AND WDS
        6. http://stackoverflow.com/questions/41575162/add-ibm-watson-knowledge-studio-model-id-to-ibm-discovery

Join The Discussion

Your email address will not be published. Required fields are marked *