CA Studio - Content Analytics Studio Annotations
Some very basic steps to create custom annotations using CA Studio.
1. Create a new Content Analytics Studio Project.
2. Add UIMA Pipeline to the project.
3. Set up UIMA pipeline by selecting document language and add lexical analysis and parsing rules to the annotator.
For this example, com.ibm.GettingStarted.en.CAPS has been created which identifies the CAPITAL words in the text.
Custom Java Annotations - Creating Java Project
Lets start with the Java Project for Custom Java Annotations using UIMA/Watson Library (Please refer prerequisite to get the eclipse)
1. Right click in the left pane, new-> Java Project
2. Add UIMA Nature to the java project.
Right click on the project and select Add UIMA Nature.You ll see the below option if the UIMA plugin has been added successfully as per above step.
Once added, the project structure will be changed.
Custom Java Annotations - Custom Annotator Java Class
Creating Custom Annotator Java Class
1. Create a simple Java Class, right click on the project, new-> Class
2. Including libraries to the lib folder and add these jars to the build path of the project.
b- UIMA core – uima-core.jar
3. Extending JCasAnnotator_ImplBase class of UIMA to CustomAnnotator.Java provides the connectivity between Collection Pipeline and our custom Java code.
4. Add unimplemented method Process(JCas) which works as Main method of the class and gets executed in first place after the deployment.
Note:- WatsonLibraryRoutinesUima2_7.jar can be obtained from the software group.
Custom Java Annotations - Extract Collection MetaData and Annotation Values
Add piece of code to extract the Metadata and the Annotation value.
1. ICADocumentDetails is a watson library class which provides the Metadata added by the WCA Collection for e.g. DocumentID, Source.
2. UIMAFunctions is the Watson library class which can return all the annotations created in studio after the deployment to the collection. This class contains some more useful methods to extract the features as well.
¬†Note:- There are many other methods useful available in watson library which can be explored.
Custom Java Annotations - Unit Testing - I
As part of Unit testing, you can test the code to get the annotation values from the eclipse.
- Go to the WCA Studio.
- Open the document to be analyzed.
- Right click on the document, analyze it using annocofig annotator cofiguration file.
- Right click on analyzed/Annotated document(Should have been annotated using annotator) and click on ‘Save As XMI’ also check the check box ‘Also Save the Type System’, provide the path to save.
5. This step will generate one XMI and one TypeSystem XML.
TS-XML – Containing Annotation information of the document and
XMI – Contains Metadata and Data information of the document.
Custom Java Annotations - Unit Testing - II
1. Copy both the files from above step to the Eclipse project. Include the TestAnnotator code.
2. Create a main class to execute TestAnnotator.java and by passing the name of the XMI and XML as parameters.
3. After executing this code, the results would be printed in the console.
By following the above steps, One should be able to extract all the annotations from the WCA server and export to the some file system or DB.
Note: Comment the Metadata code before running the test. Metadata to be used on server.