IBM Z Day on Nov. 21: Discover the ideal environment for modern, mission-critical workloads. Learn more

Image Caption Generator

Overview

This model generates captions from a fixed vocabulary that describe the contents of images in the COCO Dataset. The model consists of an encoder model – a deep convolutional net using the Inception-v3 architecture trained on ImageNet-2012 data – and a decoder model – an LSTM network that is trained conditioned on the encoding from the image encoder model. The input to the model is an image, and the output is a sentence describing the image content.

The model is based on the Show and Tell Image Caption Generator Model.

Model Metadata

Domain Application Industry Framework Training Data Input Data Format
Vision Image Caption Generator General TensorFlow COCO Images

References

Licenses

Component License Link
This repository Apache 2.0 LICENSE
Model Weights MIT Pretrained Show and Tell Model
Model Code (3rd party) Apache 2.0 im2txt
Test assets Various Asset README

Options available for deploying this model

This model can be deployed using the following mechanisms:

  • Deploy from Dockerhub:

    docker run -it -p 5000:5000 codait/max-image-caption-generator
    
  • Deploy on Red Hat OpenShift:

    Follow the instructions for the OpenShift web console or the OpenShift Container Platform CLI in this tutorial and specify codait/max-image-caption-generator as the image name.

  • Deploy on Kubernetes:

    kubectl apply -f https://raw.githubusercontent.com/IBM/MAX-Image-Caption-Generator/master/max-image-caption-generator.yaml
    

    A more elaborate tutorial on how to deploy this MAX model to production on IBM Cloud can be found here.

  • Locally: follow the instructions in the model README on GitHub

Example Usage

You can test or use this model

Test the model using cURL

Once deployed, you can test the model from the command line. For example if running locally:

curl -F "image=@assets/surfing.jpg" -X POST http://127.0.0.1:5000/model/predict
{
  "status": "ok",
  "predictions": [
    {
      "index": "0",
      "caption": "a man riding a wave on top of a surfboard .",
      "probability": 0.038827644239537
    },
    {
      "index": "1",
      "caption": "a person riding a surf board on a wave",
      "probability": 0.017933410519265
    },
    {
      "index": "2",
      "caption": "a man riding a wave on a surfboard in the ocean .",
      "probability": 0.0056628732021868
    }
  ]
}

Test the model in a Node-RED flow

Complete the node-red-contrib-model-asset-exchange module setup instructions and import the image-caption-generator getting started flow.

Test the model in CodePen

Learn how to send an image to the model and how to render the results in CodePen.

Test the model in a serverless app

You can utilize this model in a serverless application by following the instructions in the Leverage deep learning in IBM Cloud Functions tutorial.