Create a web app to interact with machine learning generated image captions

Get the code View the demo


The introduction of the IBM Model Asset eXchange (MAX) has given application developers without data science experience easy access to prebuilt machine learning models. This code pattern shows how simple it can be to create a web app that utilizes a MAX model. The web app uses the Image Caption Generator from MAX and creates a simple web UI that lets you filter images based on the descriptions given by the model.


Every day 2.5 quintillion bytes of data are created, based on an IBM study. A lot of that data is unstructured data, such as large texts, audio recordings, and images. To do something useful with the data, you must first convert it into structured data.

This code pattern uses one of the models from the Model Asset Exchange (MAX), an exchange where developers can find and experiment with open source deep learning models. Specifically, it uses the Image Caption Generator to create a web application that captions images and lets you filter through images-based image content. The web application provides an interactive user interface that is backed by a lightweight Python server using Tornado. The server takes in images through the UI, sends them to a REST endpoint for the model, and displays the generated captions on the UI. The model’s REST endpoint is set up using the Docker image provided on MAX. The web UI displays the generated captions for each image as well as an interactive word cloud to filter images based on their caption.

When you have completed this code pattern, you’ll understand how to:

  • Deploy a deep learning model with a REST endpoint
  • Generate captions for an image using the MAX Model’s REST API
  • Run a web application that uses the model’s REST API



  1. The server sends default images to the Model API and receives caption data.
  2. The user interacts with the Web UI that contains the default content and uploads the images.
  3. The web UI requests caption data for the images from the server and updates the content when the data is returned.
  4. The server sends the images to the Model API and receives caption data to return to the web UI.


Ready to put this code pattern to use? Complete details on how to get started running and using this application are in the README.