Imagine how blind people might feel if they can identify an object in front of them without touching it or asking others for help. Impossible? Not with today’s technology.
Technology can have a huge impact on someone who believes their dreams are impossible. Visual recognition and artificial intelligence can aid the blind with a mobile application to help them to become more independent.
Using Watson and other technologies, we created a mobile application to aid blind people that can:
- Identify an object placed in front of the mobile device.
- Identify the object in one of four languages: Arabic, English, French, or Japanese.
Let’s walk through the framework and services we used to build and test the application. Don’t worry, we’ve included the link to the source code below so you can try it out. But for now, stay with us as we take you though the app.
The application is Swift based. To build it, you need:
- Storyboard visual editor, to build the application GUI. It shows the pages of the application and the connections between the pages. You can add views such as buttons, table views, and text views. Storyboard enables you to visualize the appearance and flow of the user interface on one canvas.
- A Mac with Xcode, an integrated development environment (IDE) for macOS containing a suite of software development tools developed by Apple.
We used these three tools throughout the libraries.
- Core ML, a machine learning framework used across Apple products that allows us to integrate machine learning models into our application. The main motivation behind using Core ML is that it allows running the machine learning model on the device itself, so data analyzing is done faster since data doesn’t leave the device. We use Core ML in our application for the visual recognition feature that identifies objects.
- AV Foundation, a framework used for audiovisual assets, control device cameras, process audio, and configuring system audio interactions on Apple operating systems. In our application, AV Foundation is used for the text-to-speech function that reads out loud the identified object.
- Watson Language Translator service, which translates the text when the user wants to switch languages.
Building the application
There are two ways to build a swift mobile application: build the whole application programmatically, or use storyboards. We used storyboards because they’re easy to use and we could rapidly develop our application.
We leveraged Apple’s machine learning library, Core ML, to do in-app, real-time machine learning with a predefined machine learning model. We used Core ML instead of Watson Visual Recognition to make it real-time with a lot less latency. Also, Core ML lets us use visual recognition without an internet connection, unlike Watson Visual Recognition. That means the user doesn’t need an internet connection to use the app.
We also used the AV Foundation library to do the text-to-speech function. This library takes the text generated by the visual recognition and converts it to voice to read to the application user.
Finally, to demonstrate the visual recognition in different languages, we used the Watson Translator service. Once the object is recognized, the user can choose the language for the translation. The Watson Language Translator service translates the text that identified the object to the selected language. Again, the user can choose from four languages: Arabic, English, French, and Japanese.
Here’s a screen example of the application recognizing a water bottle, and providing the identifying text in four languages:
Try the application
After you install the prerequisites, you can download the application code from Github. Use Xcode to run the application and try it out on different objects!
There are several ways that services can be used to enhance our application, such as identifying the color, size, distance and much more. If you think of ways to improve this application, please let us know either in the Comments section of this post or on GitHub. We hope to hear from you!