Classify art using TensorFlow  

Pull data and labels from The Metropolitan Museum of Art to train an image-classification system

Last updated | By Ton Ngo, Winnie Tsang


Learn how to build your own data set and train a model for image classification. The models are available in TensorFlow and are run on a Kubernetes cluster. The demo code pulls data and labels from The Metropolitan Museum of Art website and Google BigQuery. The IBM Cloud container service provides the Kubernetes cluster. Modify the code to build different image data sets and select from a collection of public models, such as Inception, VGG, ResNet, AlexNet, and MobileNet.


Humans have an uncanny ability to process visual information. With one quick glance at a photo, we can understand what is in the scene and act accordingly. Giving computers this ability, however, has been challenging, but there is a wide range of potential use – from tagging your Facebook friends in a photo to recognizing cancer from a photo of a skin condition. Image recognition has been an active area of research and development for many years. Recent advances in deep learning have brought significant improvements to image recognition and classification, to the degree that many neural network models are now available and offer state-of-the-art performance. If your application requires recognizing characteristics of an image, you can leverage one of these models to train and deploy a neural network to serve your application.

To explain this process in practice, in this IBM Code developer pattern, we will use deep learning to train an image classification model. Specifically, we will use data from the art collection at The Metropolitan Museum of Art and metadata from Google BigQuery. We will use the Inception model implemented in TensorFlow, and we will run the training on a Kubernetes cluster. We save the trained model and load it later to perform inference. To use the model, we provide as input a picture of a painting, and the model will return the likely culture – Italian Florentine art, for instance. You can adapt the data by choosing some other attributes to classify the art collection, such as author, time period, etc. You can choose an entirely different source of data, or a different category for classification, along with different ways to create the labels. You can also choose other models, such as VGG, ResNet, AlexNet, MobileNet, etc.

Depending on the compute resource available, you can choose the number of images to train, the number of classes to use, and more. For the purpose of showing the full working process, we will select a small set of images and a small number of classes to allow the training to complete within a reasonable amount of time. With a large data set, the training may take days or weeks.

When you have completed this pattern, you will better understand how to:

  • Collect and process data for deep learning in TensorFlow
  • Configure and deploy TensorFlow to run on a Kubernetes cluster
  • Train an advanced image classification neural network
  • Use TensorBoard to visualize and understand the training process


  1. Inspect the available attributes in the Google BigQuery database for the Met art collection.
  2. Create the labeled data set using the attribute selected.
  3. Select a model for image classification from the set of available public models and deploy to IBM Cloud.
  4. Run the training on Kubernetes, optionally using GPU if available.
  5. Save the trained model and logs.
  6. Visualize the training with TensorBoard.
  7. Load the trained model in Kubernetes and run an inference on a new art drawing to see the classification.

Related Blogs

Jax 2018 – Just An Awesome Experience

What a week! From 23rd to 27th April our Berlin team attended the Jax conference in Mainz, Germany. We had such a great time sharing our fresh perspectives, in the form of a rousing keynote and two informative sessions. The concept of this annual event with over 2,000 participants, revolves around innovating with Java, architecture,...

Continue reading Jax 2018 – Just An Awesome Experience

CloudNativeCon and KubeCon are coming to Copenhagen!

With May just around the corner, mark your calendars for an exciting event, CloudNativeCon/KubeCon, in Denmark’s capital city of Copenhagen. Many of us in the Cloud Native community already visited this beautiful city for DockerCon EU last year and we’re excited to be able to take in all of the wonderful sites again this year....

Continue reading CloudNativeCon and KubeCon are coming to Copenhagen!

Live analytics with an event store fed from Java and analyzed in Jupyter Notebook

Event-driven analytics requires a data management system that can scale to allow a high rate of incoming events while optimizing to allow immediate analytics. IBM Db2 Event Store extends Apache Spark to provide accelerated queries and lightning fast inserts. This code pattern is a simple introduction to get you started with event-driven analytics. You can...

Continue reading Live analytics with an event store fed from Java and analyzed in Jupyter Notebook

Related Links

Architecture center

Learn how this code pattern fits into the Cognitive discovery Reference Architecture


Check out the open source library for machine intelligence.

Kubernetes cluster

Explore the open source system for orchestrating containers on a cluster of servers.

Google BigQuery

Learn about the web service that provides interactive analysis of massive data sets.