Overview

Skill Level: Intermediate

The first part of the series explain about exporting the CA studio annotations from the WCA pipeline, using UIMA and write to any backend/third party system using custom java code.

Ingredients

  • Familierity with Apache UIMA, Annotations, Watson Content Analytics Admin Console, Watson Content Analytics Studio.
  • Eclipse installation with installed UIMA Eclipse plugin - Eclipse Installation with UIMA plugin.
  • Good understanding of Java and/or a backend system.
  • Refer Content Analytics Studio tutorial for step by step creation of Annotations.
  • This part(Part-I) of the tutorial would talk about extracting the Annotation values.
  • Part-II would be detailing about Custom Index/Facets and deploy to WCA Collection

Step-by-step

  1. CA Studio - Content Analytics Studio Annotations

    Some very basic steps to create custom annotations using CA Studio.

    1. Create a new Content Analytics Studio Project.

    Project

    2. Add UIMA Pipeline to the project.

    AnnoConfig1

    3. Set up UIMA pipeline by selecting document language and add lexical analysis and parsing rules to the annotator.

    For this example, com.ibm.GettingStarted.en.CAPS has been created which identifies the CAPITAL words in the text.

    caps-2

  2. Custom Java Annotations - Creating Java Project

    Lets start with the Java Project for Custom Java Annotations using UIMA/Watson Library (Please refer prerequisite to get the eclipse)

    1. Right click in the left pane, new-> Java Project

    NP-1

    2. Add UIMA Nature to the java project.

    Right click on the project and select Add UIMA Nature.You ll see the below option if the UIMA plugin has been added successfully as per above step.

    UN

    Once added, the project structure will be changed.

    UN1

  3. Custom Java Annotations - Custom Annotator Java Class

    Creating Custom Annotator Java Class

    1. Create a simple Java Class, right click on the project, new-> Class

    NC

    NC1

     

    2. Including libraries to the lib folder and add these jars to the build path of the project.

    a –¬†WatsonLibraryRoutinesUima2_7.jar

    b- UIMA core – uima-core.jar

    jar

     

    3. Extending JCasAnnotator_ImplBase class of UIMA to CustomAnnotator.Java provides the connectivity between Collection Pipeline and our custom Java code.

    4. Add unimplemented method Process(JCas) which works as Main method of the class and gets executed in first place after the deployment.

     

    extends

     

    Note:- WatsonLibraryRoutinesUima2_7.jar can be obtained from the software group.

  4. Custom Java Annotations - Extract Collection MetaData and Annotation Values

    Add piece of code to extract the Metadata and the Annotation value.

    1. ICADocumentDetails is a watson library class which provides the Metadata added by the WCA Collection for e.g. DocumentID, Source.

    metadata

    2. UIMAFunctions is the Watson library class which can return all the annotations created in studio after the deployment to the collection. This class contains some more useful methods to extract the features as well.

    anno

     Note:- There are many other methods useful available in watson library which can be explored.

  5. Custom Java Annotations - Unit Testing - I

    As part of Unit testing, you can test the code to get the annotation values from the eclipse.

    1. Go to the WCA Studio.
    2. Open the document to be analyzed.
    3. Right click on the document, analyze it using annocofig annotator cofiguration file.
    4. Right click on analyzed/Annotated document(Should have been annotated using annotator) and click on ‘Save As XMI’ also check the check box ‘Also Save the Type System’, provide the path to save.

     

    xmi

    5. This step will generate one XMI and one TypeSystem XML.

    xmi1

    TS-XML – Containing Annotation information of the document and

    XMI – Contains Metadata and Data information of the document.

    xmi2

     

     

  6. Custom Java Annotations - Unit Testing - II

    1. Copy both the files from above step to the Eclipse project. Include the TestAnnotator code.

    test

    2. Create a main class to execute TestAnnotator.java and by passing the name of the XMI and XML as parameters.

    main

    3. After executing this code, the results would be printed in the console.

    result

    By following the above steps, One should be able to extract all the annotations from the WCA server and export to the some file system or DB.

    Note: Comment the Metadata code before running the test. Metadata to be used on server.

     

1 comment on"Extract CA Studio Annotations using UIMA and Watson Library, add Custom Index/Facets and deploy to WCA Collection - Part-I"

  1. KumarApurva August 29, 2017

    It’s crisp and contains all the required steps. Great work.

Join The Discussion