Join us for Code @ Think 2019 | San Francisco | February 12 – 15 Register now Limited availability
Get the code
View the demo
By Neha Setia, Vishal Chahal, Manjula Hosurmath | Published September 14, 2018 - Updated September 14, 2018
Artificial IntelligenceData SciencePythonCloud
In any business, Microsoft Word documents are commonly used. They contain information in the form of raw text, tables, and images. And all of the documents contain facts important to that business. This code pattern addresses the problem of extracting knowledge out of text and tables in domain-specific word documents. We build a knowledge graph on the knowledge extracted, which makes the knowledge queryable. This gives you the best of both worlds – training and a rules-based approach to extract knowledge out of documents.
One of the biggest challenges in the industry today is how to make machines understand data in documents just like humans understand the context and intent of the document by reading it. The first step towards this goal is to convert the unstructured information (free-floating text and tables text) to a semi-structured format and then process it further. That’s where graphs play a major role – giving shape and structure to the unstructured information present in the documents. This code pattern looks at the problem of extracting knowledge out of text and tables in domain-specific Word documents. A domain-specific knowledge graph is built on the knowledge extracted, and this makes the knowledge queryable. You can use this code pattern to to shape your analysis and use the data for further processing to get better insights.
The code pattern demonstrates a way to derive insights from a document containing raw text and information in tables using IBM Cloud, IBM Watson services, the Python package Mammoth, the Python NLTK, and IBM Watson Studio.
With this code pattern, you get:
Find the detailed steps for this pattern in the README. Those steps will show you how to:
Get the Code »
Back to top