A beginner’s guide to setting up a visual recognition service

This article is part of the Watson Visual Recognition learning path.


The IBM Watson Visual Recognition service is a powerful AI tool that identifies image content. The service comes with the following pretrained models; but can also be customized to recognize custom classes.

  • General: A trained model that lets you use an active and large data set to gain insights from your own pictures.
  • Food: Similar to the general model, but this model uses a specific food data set. You can use the model to find any type of food, and one of the main applications is in the catering and restaurant industry, especially if you want to create a specific menu.
  • Explicit: Allows you to analyze whether an image contains inappropriate content.
  • Text: Allows you to extract text from a picture and have a textual metadata. This option is currently in private beta, but you can sign up for it.
  • Custom: Lets you create a custom model with your own pictures and train it to get better results.

This article is part of a learning path that helps you gain a better understanding about how Visual Recognition works, and how you can use it to build your own artificial intelligence (AI) solutions.

Terms and concepts

This section covers the terms and concepts that are helpful to know when discussion Watson Visual Recognition.

Term Definition
Custom Model Create custom, unique visual classifiers. Use the service to recognize custom visual concepts that are not available with general model.
Core ML The Watson Swift SDK supports offline image classification using Apple Core ML. Once a custom model is ready to use then a Core ML model is available to download and use for offline classification. The Core ML model is downloaded on the device’s file system and images can be classified offline, directly on the device, without having to make an REST call.


A common way to use Visual Recognition is by accessing the Visual Recognition APIs from your application. The Watson team releases SDKs that support many programming languages so that you can use Visual Recognition easily in a web or mobile application.

Given the data being worked on, one of the most common ways of using Watson Visual Recognition is with a mobile app. Below are is the architecture for an iOS app that uses Core ML. If creating a non-iOS app, the premise is the same, simply remove the Core ML component.


Use cases

The Visual Recognition service can be used for diverse applications and industries, such as:

  • Manufacturing: Use images from a manufacturing setting to make sure products are being positioned correctly on an assembly line
  • Visual auditing: Look for visual compliance or deterioration in a fleet of trucks, planes, or windmills out in the field, train custom models to understand what defects look like
  • Insurance: Rapidly process claims by using images to classify claims into different categories

Accessing Watson Visual Recognition

Tooling via Watson Studio


Each instance of the Visual Recognition service comes with an API key that lets you call the API by creating a REST call. The APIs and SDKs are documented later. When using the Visual Recognition tool, you’ll go to the Overview tab, allowing the user to select pre-trained models, or to create a custom model.



When you click Test on one of the models, the front page of that model opens. There you can find all of the information about the model such as how many data sets of pictures you have and how many pictures you uploaded. Below we try out the General model with a few random images.



When you click Implementation, you get code to help you call this model from your code. There are snippets for Java, Python, Node, and Swift.


Video: Watson Visual Recognition tooling

Take an in-depth tour of the Watson Visual Recognition tooling available in Watson Studio including the steps to create a custom classifier.


For programmatic access, Discovery comes with support for a large number of languages. The following list shows the current developer SDKs.


For more information on the APIs, see the Visual Recognition API documentation.

Code sample

The following Python code sample shows how to classify an image with the Watson Python SDK available on pypi.

>>> from ibm_watson import VisualRecognitionV3
>>> visual_recognition = VisualRecognitionV3(version="2018-03-19", iam_apikey="pFW...uhO")
>>> import json
>>> with open('./car.jpg', 'rb') as images_file:
...     classes = visual_recognition.classify(images_file).get_result()
...     print(json.dumps(classes, indent=2))
  "images": [
      "classifiers": [
          "classifier_id": "default",
          "name": "default",
          "classes": [
              "class": "vehicle",
              "score": 0.776
              "class": "motor vehicle",
              "score": 0.512,
              "type_hierarchy": "/vehicle/wheeled vehicle/motor vehicle"
              "class": "ash grey color",
              "score": 0.96
      "image": "car.jpg"
  "images_processed": 1,
  "custom_classes": 0


This article is part of a learning path that guides you through building fully featured mobile apps that are built on the Watson Visual Recognition service. Within this learning path, you’ll get a chance to work with advanced Visual Recognition features, as well as learn how to integrate Visual Recognition with other Watson services.

So let’s get started. The next step will be to compile a pre-made iOS app that accesses built-in and custom Watson Visual Recognition models.