Best practices for using custom classifiers in Watson Visual Recognition

This article is part of the Watson Visual Recognition learning path.

Level Topic Type
100A Introduction to computer vision Article
100B Introduction to Watson Visual Recognition Article
101 Create an iOS app that uses built-in and custom classifiers Code pattern
201 Build a custom visual recognition model and deploy to an iOS app Tutorial
202 Best practices for using custom classifiers in Watson Visual Recognition Article
301 Build an iOS game powered by Core ML and Watson Visual Recognition Code pattern

The IBM Watson Visual Recognition API is a powerful AI tool that identifies image content. The API comes with pretrained models that can accurately detect objects, scenes, colors, and foods, which can facilitate fast adoption and implementation. But the real power comes from the ability to train Watson to recognize custom classes.

With the power of custom classifiers, users helped California save water, performed infrastructure inspections with drones, and automated quality control inspections for automotive assembly lines. To get the most out of the Visual Recognition API, there are a number of techniques and optimizations that will help to maximize accuracy.

Preparation is key

A custom classifier’s accuracy is reliant on the quality of the training images that are provided. In the past, users who closely controlled their training processes observed greater than 98 percent accuracy. Accuracy, different from confidence score, is based on a ground truth for a particular classification problem and dataset.

As a best practice, create a ground truth to benchmark classifier accuracy against human classification. The benefits of automated recognition, in addition to increased accuracy, is that the results are reliable over time because it does not fatigue and is able to process images at scale much quicker.

For a custom classifier to be considered accurate, it not only needs to be able to correctly identify when images should be in a class, but also ensure that no images are incorrectly attributed to that class. These misattributions are called false positives.

Avoiding these false positives is the reason why two sets of images must be provided when training. The first set is “positive examples” of the class. The second set is “negative examples,” for example, the object or condition does not exist within the image.

Crucially, to increase the accuracy of the classifier, the negative examples should be as visually similar to the positive examples while still not exhibiting the wanted condition for that class. This way, Watson learns what features of the image are important, which reduces false positives.

To create these sets of positive and negative examples, you need a list of manually classified images. It’s important that the training images that are provided have a high degree of confidence, so it’s best not to include any images that you have trouble manually classifying.

In terms of the training dataset size, it’s recommended to provide at least 50 positive examples and 50 negative examples. The more images that you provide, the more accurate the classifier is. However, after 5000 images there is likely to be little improvement. The recommendation is to train 150 to 200 images to provide a good balance between training time and accuracy.

When sourcing manually classified images, it’s best to find a wide range to maximize the experience that Watson is provided with. There are many factors to consider, including:

  • Lighting
  • Angle
  • Focus
  • Color
  • Shape
  • Distance from subject
  • Presence of other objects in the image

Note that Watson takes a holistic approach when being trained on each image. While it evaluates all of the elements listed above, it cannot be tasked to exclusively consider a specific element.

However, it’s not always feasible to build a training dataset with this level of variation. This issue can be addressed by automatically generating manipulated images. For example, images can be duplicated and automatically rotated, shifted, rescaled, zoomed, and horizontally flipped. Doing this can drastically increase your training dataset size with little effort.

If the condition you’re trying to recognize is only in a small section of the image, Watson can struggle to learn to recognize the class without a very large training dataset. In this case, it’s best to slice it into smaller images, classifying each one and aggregating the results.

With the manually classified images that are gathered and generated, you’re now ready to train the custom classifier. To be able to test for accuracy, these images must be split into two sets, one for training and one for testing. You must do this process for both negative examples and positive examples.

For best results, the manually classified images should be randomly assigned to the training and testing image sets. This ensures that image variation exists across both sets. Without doing this random assignment, the classifier might learn on too narrow a dataset and result in poor performance when tested against visually dissimilar images. Methods on how to split and test the dataset during this phase are complex topics themselves. To begin, splitting images 50:50 and 70:30 training to testing is sufficient.

Classifying an image

To classify an image, submit the image to the classify endpoint, replacing the API key with your own in this URL:

curl -X POST -u "apikey:{apikey}" -F "images_file=@fruitbowl.jpg" ""

The JSON response provides a list of class confidence scores for each image submitted.

    "images": [{
        "classifiers": [{
            "classes": [{
                    "class": "apple",
                    "score": 0.645656
                    "class": "fruit",
                    "score": 0.598688
                    "class": "food",
                    "score": 0.598688
                    "class": "orange",
                    "score": 0.5
                    "class": "vegetable",
                    "score": 0.28905
                    "class": "tree",
                    "score": 0.28905
            "classifier_id": "default",
            "name": "default"
        "image": "orange-apple-banana-isolated.jpg"

Each score is 0 to 1 and represents the confidence Watson has in the returned classification based on the training data for that classifier. The API classifies for all classes in the classifier, but you can adjust the threshold to return only results above a certain confidence score.

The custom classifier scores can be compared to one another to compare likelihoods. The cut-off point at which confidence scores are considered strong enough to put an image in a class depends on the use case. Evaluating the benefits, risks, false positives, or true negatives helps to guide where this threshold should be.

Examples of difficult use cases

While Watson Visual Recognition is highly flexible, there have been a number of recurring use cases where the API either struggles or requires significant pre- or post-work from the user.

  • Detecting details: Occasionally, users want to classify an image based on a small section of an image or details that are scattered within an image. Because Watson analyzes the entire image when training, it might struggle on classifications that depend on small details. Some users have adopted the strategy of breaking the image into pieces or zooming into relevant parts of an image. See this hail damage classification video as an example.
  • Emotion: Emotion classification is not a feature that is currently supported by Visual Recognition. Some users have attempted to do this through custom classifiers, but this is an edge case and the accuracy of this type of training cannot be estimated.

Examples of good and bad training images

Good images

These images demonstrate good training because images in training and testing sets should resemble each other with regard to angle, lighting, distance, size of subject, and so on. See the case study OmniEarth: Combating drought with IBM Watson cognitive capabilities for more details.

Training data Test image
good training data good test data

Bad images — apples

The following images demonstrate bad training because the training image shows a close-up shot of a single apple while the testing image shows a group of apples taken from a distance with other visual items introduced (such as the conveyor belt). It is possible that Watson might fail to classify the test image as apples, especially if another class in the classifier contains training images of a large group of round objects (such as peaches or oranges).

Training data (close up, single) Test image (obscured, grouped)
Image of one apple apples on a conveyor belt

Bad images — sofas

The following images demonstrate bad training because the training image shows a close-up shot of a single sofa in a well-lit, studio-like setting while the testing image shows a sofa that is in a visually busy setting, farther away, and situated among many other objects. Watson might not be able to properly classify the test image due to the number of other objects that clutter the scene.

Training data (close up, single) Test image (obscured, multiple)
Image of one sofa apples on a conveyor belt

Training data recommendations

To summarize my tips to maximize the accuracy of your custom classifier:

  • Include at least 150 positive and negative examples during training.
  • Maximize the variation in image composition within each example set.
  • Minimize the variation between positive and negative examples.
  • Crop images if the condition to match represents only a small part of the larger image.
  • Consider generating larger training datasets by automatically applying transformations to create variants.

We’re excited to see what you build with Watson Visual Recognition. It has never been easier or cheaper to train an image classifier with IBM Watson. If you need any help along the way, submit your questions to our developerWorks forums.


This article provided best practices for using Watson Visual Recognition and explained how to get the most out of the Watson Visual Recognition API. The article is part of the Watson Visual Recognition learning path. To continue with the learning path, look at the next step, Build an iOS game powered by Core ML and Watson Visual Recognition.