Image Caption Generator

Get this modelTry the API Try the web app Try in a Node-RED flow


This model generates captions from a fixed vocabulary that describe the contents of images in the COCO Dataset. The model consists of an encoder model – a deep convolutional net using the Inception-v3 architecture trained on ImageNet-2012 data – and a decoder model – an LSTM network that is trained conditioned on the encoding from the image encoder model. The input to the model is an image, and the output is a sentence describing the image content.

The model is based on the Show and Tell Image Caption Generator Model.

Model Metadata

Domain Application Industry Framework Training Data Input Data Format
Vision Image Caption Generator General TensorFlow COCO Images



Component License Link
This repository Apache 2.0 LICENSE
Model Weights MIT Pretrained Show and Tell Model
Model Code (3rd party) Apache 2.0 im2txt
Test assets Various Asset README

Options available for deploying this model

This model can be deployed using the following mechanisms:

  • Deploy from Dockerhub:
docker run -it -p 5000:5000 codait/max-image-caption-generator
  • Deploy on Kubernetes:
kubectl apply -f

Example Usage

You can test or use this model

Test the model using cURL

Once deployed, you can test the model from the command line. For example if running locally:

curl -F "image=@assets/surfing.jpg" -X POST
  "status": "ok",
  "predictions": [
      "index": "0",
      "caption": "a man riding a wave on top of a surfboard .",
      "probability": 0.038827644239537
      "index": "1",
      "caption": "a person riding a surf board on a wave",
      "probability": 0.017933410519265
      "index": "2",
      "caption": "a man riding a wave on a surfboard in the ocean .",
      "probability": 0.0056628732021868

Test the model in a Node-RED flow

Complete the node-red-contrib-model-asset-exchange module setup instructions and import the image-caption-generator getting started flow.