Win $20,000. Help build the future of education. Answer the Call for Code. Learn more

Learn to build a mobile application to snap and translate text in images

According to Ethnologue, there are 7097 known living languages, out of which more than 23 languages account for more than half of the world’s population and nearly two-thirds of languages are from Africa and Asia. While it is too difficult to learn all of the languages, an AI-powered language translator can definitely augment the human language translating capabilities.

alt Source:

Snap and translate text in images is a code pattern that helps you build a mobile application that can translate the recognized text from captured images or images that are uploaded from your phone’s photo album and extract emotion and sentiment from the text.

We used Apache Cordova, an open source hybrid mobile application development framework to build the mobile application. The server application, built in Node.js, is deployed in the IBM Cloud Kubernetes service and uses Tesseract OCR, IBM Watson Language Translator, and IBM Watson Natural Language Understanding.

Tesseract OCR is an open source OCR engine that can recognize more than 100 languages out of the box and can also be trained to recognized other languages.

IBM Watson Language Translator can identify the text language and translate it from one language to another language. It uses the Neural Machine translation, which is the latest method based on deep learning with improved accuracy, faster speeds, and new languages. Most of the provided translation models in Language Translator can be customized to learn custom terms and phrases. You can either customize the model with a forced glossary or with a corpus that contains parallel sentences.

  • Forced glossary is used to force certain terms and phrases to be translated in a specific way. For example, you can force the model to always use “brevet” when translating “patent” from English to French.
  • Parallel corpus is used to provide more translations for the base model to learn from. This helps to adapt the base model to a specific domain. The resulting custom model translates text depending on the model’s combined understanding of the parallel corpus and the base model.

IBM Watson Natural Language Understanding can analyze semantic features of text input like categories, concepts, emotions, entities, keywords, metadata, relations, semantic roles, and sentiment. You can also extend Natural Language Understanding with custom models that can identify custom entities and relations unique to your domain using Watson Knowledge Studio.

Text recognized from the image using Tesseract OCR is passed to Language Translator for translation, and sentiment/emotion of the translated text is extracted using Watson Natural Language Understanding. Then, the response from Tesseract OCR, Language Translator, and Natural Language Understanding is displayed to the user.

Try it out and share your feedback with us!