In the quickly expanding field of AI technologies the idea of fast and easy to use machine learning is closer to becoming a reality. Machine learning engines like TensorFlow, Keras, PyTorch, and Caffe2 are making it easier for data scientists and developers to do scalable machine learning. This has brought about the rise of various open source and proprietary “model zoos” for accessing the ever-growing number of machine learning models.

We believe this growing community deserves an access point for finding models that are open and free. The Model Asset eXchange (MAX) is a one-stop exchange for data scientists and developers to find and use machine learning models. MAX offers both pretrained and trainable models that are created with open source ML engines like TensorFlow, Keras, PyTorch, and Caffe2. MAX models include detailed metadata on how they were trained and on what data.

Creating a web application

To see how easy it is to use a model from MAX in an application, we decided to create a web application using the Image Caption Generator model we found on MAX. The Image Caption Generator Web App was designed to let you filter a set of images based on their content, which would be determined by the captions generated by the model.

Our first design decision was to have the web app depend on an already running API endpoint for the Image Caption Generator given that this is how most developers would use the model. To use recent experience from other projects, we decided to write our server using Python and the Tornado web framework. The server is used to serve an initial web UI that is then interacted with using client-side JavaScript. The repository includes a default set of images and on server startup, the images are sent to the model endpoint. The server saves the caption data returned by the model endpoint in a Python dictionary to be served with the images to the UI.

When you first hit the web UI the server populates it using the initial images and their caption data. When creating ways to interact with the images we decided on two interfaces, a selectable grid of the images and an interactive word cloud generated from the selected images’ captions. When you click a word in the word cloud, a subset of the currently selected images containing the word is selected. This functionality along with the direct selectability of the image grid lets you filter the images based on their content.

The web UI also includes an upload feature that lets you add your own images to the default image set. The web UI uses client-side JavaScript to upload the images to the server and request their caption data. The same as on server startup, the images are sent to the model API and the returned caption data is saved to the dictionary. Then, the server returns the new caption data to the web UI and client-side JavaScript is used to update the UI with the new images and captions. By using client-side JavaScript for the upload process, you get a smoother experience and by updating the data on the server the new images will also persist on UI page reload.

With both the filter and upload features our web app gives users a way to filter images based on their content. Having met our original goal of showing how our chosen model could be used in a simple web application, we decided to add one final feature to let you interact with the caption data directly. The web UI includes the ability to drill into an image in the image grid and see a larger version and the top three captions returned by the Image Caption Generator.

Try it out

Our web app is available on GitHub, and we would love it if you took it for a test run. We are also open to improving it, so if you have suggestions feel free to open an issue on the repo.

Learn More

Join The Discussion

Your email address will not be published. Required fields are marked *