Computer vision is really coming of age. Apps that use AI vision are becoming more practical and commonplace thanks to a combination of innovations that have really dovetailed to expand the possibilities. And, developing new object detection applications with deep learning is easier than you might think.
Not long ago, deep learning was a very slow and not-so-practical endeavor. To leap from a research field to a developer’s tool, we needed more than just faster CPUs and plenty of hours for training — we needed GPUs. GPUs provided the boost that was needed with parallel computing performance. For better accuracy in less time, make sure your AI platform has access to GPU power.
Deep learning with transfer learning
GPUs provided the performance, but advances in deep learning with deep neural networks also provided a big step toward solving problems such as computer vision. However, the real key for most developers is the ability to leverage the work of others. Writing deep learning and neural networks from scratch would not work for most of us. Thanks to the frameworks that are available and the models that are reusable, deep learning can be easy to leverage. An AI novice can do image classification and object detection. Not long ago, that was not realistic. Today, you can use “transfer learning” — i.e., use an existing image recognition model and retrain it with your own dataset.
Making it accessible
Getting these frameworks, models, and GPUs to work together can be a little tricky. Developers that need to focus on delivering a specific app don’t have time to go off and build AI platforms and learn to be data scientists. This is where simplification is needed, and with a platform like IBM PowerAI Vision, developers can focus on implementing the app instead of implementing the platform. Loading a dataset with images and labeling the objects are tasks that can be done by anyone — no need to write code. Using deep learning with GPU acceleration can also be done with a few clicks of a mouse. One more click, and the model is deployed as a REST endpoint so developers can do their thing and write their app with object detection.
When using computer vision in an app, object detection versus image classification is a key decision. For example, if you have a picture of an animal, do you just need to know whether it is a dog or “not a dog”? Or do you want to locate the dogs in the picture and perhaps count them? Looking at the whole picture and choosing a label is image classification. Of course, it can be more complex than dog/not-dog (for example, breed identification). If you really want to locate the dog (or whatever object) in the picture, then you want object detection. Training a model for image classification requires example datasets for each label. Training a model for object detection requires a dataset where each appearance of each target object in each image is identified. For example, in PowerAI Vision you would select a dog label and draw a bounding box around each dog. After your dataset has enough images with enough labeled objects, you can train a model. If your model is not accurate enough, add to your dataset and train some more. When using object detection in an app, the main difference between object detection and image classification is how you use the location and count information. An object detection app is likely to use the location to somehow highlight the object. It could also simply count how many were detected.
The counting Coke bottles code pattern
The code pattern at https://github.com/IBM/powerai-vision-object-detection uses the example of locating and counting Coca-Cola bottles in images. You can walk through the code pattern to create a REST endpoint with Coke bottle detection. PowerAI Vision takes advantage of GPUs to accelerate your deep learning tasks. It has built-in deep learning models so that you can train and deploy an object detection model without experience in deep learning or computer vision. Given an image to analyze, the REST endpoint that you created returns location information for each detected object. You can use functions like this to create a store inventory app, or take the same techniques with your own dataset for a wide variety of object detection use cases that require recognizing, locating, and counting objects in images.
As of December 2017, the above code pattern refers to a Tech Preview of PowerAI Vision, which could be installed on Power Systems or used with a trial account in the cloud. Please give it a try and take what you learn to build great cognitive apps.