Win $20,000. Help build the future of education. Answer the call. Learn more

Learn how to identify information in document images

Manually processing large numbers of applications is a tedious, time-consuming, and error-prone process. Cognitive technology can help address this challenge. Along with image recognition techniques, cognitive services can be used to automate application processing. The process involves identifying the application form document, extracting text from documents, and building intelligence to identify relevant information from the document.

Part 1 of this process covers solution for image recognition. The second part of the composite code pattern deals with extracting text from documents and identifying appropriate information from it. We will use Python, Jupyter Notebooks, the Python NLTK, the Watson Natural Language Understanding API, and IBM Cloud Object Storage.

This code pattern covers classifying images to separate out the application form documents, extracting text from application form documents, and identifying entities (information) from application form documents and determining what the application form is for using configuration files. After completing this pattern, you will have learned how to extract text using OCR, extract entities from documents, use a configuration file to build configurable and layered classification grammar, and use the combination of grammatical classification and regex patterns from a configuration file to extract information.

Check out Part 2 and give it a try.