From days to minutes. This is the time difference a data scientist can expect to save using IBM PowerAI Vision.

One day, I was looking through one of our internal user groups and happened upon a group called Deep Learning. Within that group is a forum for sharing information and experiences on deep learning. I clicked on one entry entitled “Use cases for Caffe” and opened the thread. There was one post asking about experiences in Caffe and thoughts about other methods of network structure design other than trial and error. I suggested to the poster to try transfer learning using Inception V3. Transfer learning is well documented and there are numerous examples on how to get going. In this case, I suggested to use, which was written by Franck Barillaud, IBM Senior Technical Staff Member.

After a few days, the poster came back with his experience. He was impressed how simple it was to use transfer learning. He said it was simple enough for anyone, even without any experience in python or tensorflow. However, he was only able to reach a final accuracy of 81% compared to 93% with a much simpler neural network that he had written using Caffe.

Ok, no problem, let’s try something else.

I asked the poster for permission to use his dataset using PowerAI Vision, a tool included in IBM PowerAI ( PowerAI Vision is a deep learning development platform that an application developer, with limited knowledge about deep learning, can use to train and deploy deep learning models targeted at computer vision for their application needs. The results were unexpectedly amazing!

Not only did it take minutes to setup, we got a much higher accuracy of 94.5% with 0.12 loss. This is much better compared to the 93% accuracy on Caffe and 81% on Inception V3 using transfer learning.

The complete process took about 30 minutes:

  •    Dataset download and import: 15 minutes
  •    PowerAI Vision Task Definition: 5 minutes
  •    Training time: 10 minutes

Figure 1 shows some graphs of training loss and accuracy.

Figure 1 shows some graphs of training loss and accuracy


This clearly shows the benefit of using PowerAI Vision for image classification. Model setup is a matter of point and click which is a good proof point that a user does not necessarily need deep data scientist skills and spend days writing a neural network to get decent results.

A complete description of the image recognition use case

The poster described his project as follows:

Inspect images of photoresist openings after having been exposed and developed. The central opening (the bright core) measures approximately 20 microns in diameter. The outer disk measures approximately 130 microns:

Figure 2: Photoresist image


Classify the image in 4 classes:

  • No defect
  • Presence of a dark spot
  • Presence of a bright spot
  • Presence of a scratch

Each image represents a hole in a photoresist film. The goal is to make sure that the photoresist film is clear of defects (dark spots, bright spots or scratches) in the area between the 20 microns central opening and the 130 microns peripheric disk.

Figure 3 shows examples of images showing defects:

Figure 3: Photoresist images with defects


The potential cost savings when using image recognition is huge. And it can be used in any industry that requires visual inspection. The addition of image recognition can alleviate human errors and increase quality of outcome in any industry like manufacturing, healthcare, oil and gas, financial or telecommunications. And with IBM PowerAI Vision, the requirement of hiring experienced data scientist to develop image recognition applications may be a thing of the past.

To find out more about PowerAI and this technology preview, visit IBM PowerAI Vision Technology Previews.

Special thanks to Sebastien Gilbert of IBM for his technical contributions.

4 comments on"IBM PowerAI Vision speeds transfer learning with greater accuracy: A real world example"

  1. Great use case!

  2. SrinivasChitiveli December 08, 2017

    IBM PowerAI Vision makes data labeling, training and deployment a ‘Clicking’ job 🙂

  3. Gilbert Thomas April 16, 2018

    Great results, but What was the reason for the huge increase in accuracy? What does AI Vision do exactly to get that improvement?

    • hi Gilbert, thanks for the question. Before I answer your question, I re-ran the training again on a newer version of IBM AIVision and this time I was able to get 97.188% accuracy with loss 0.092.

      There are a few reasons I can think. First one is the number of layers that were used in the original Caffe program (I believe it was less than 10). I’m certain that IBM AIVision has a greater number of convolutional and hidden layers. In theory, the more layers you have in your network , the more accurate your results can be (assuming the application was written correctly). A second reason is the tuning of hyper-parameters. As in my most recent training values, I was able to “tune” a parameter called “weight decay” and re-run the test with different values. Since the training took no longer than 10 minutes per training cycle, I was able to “play” around with this parameter as well as others like epoch and learning rate to come up with a model with better accuracy and lower loss in a shorter amount of time. Since AI Vision is a point and click tool it was really easy to do. Hope this helps.

Join The Discussion

Your email address will not be published. Required fields are marked *