Archived | Accelerate AI data preprocessing with PyWren and IBM Cloud Functions

Archived content

Archive date: 2019-11-25

This content is no longer being updated or maintained. The content is provided “as is.” Given the rapid evolution of technology, some content, steps, or illustrations may have changed.


This code pattern uses a Jupyter Notebook running in Watson Studio to demonstrate how serverless computing can provide a great benefit for AI data preprocessing. The pattern demonstrates face recognition deep learning using the Watson Machine Learning service, while letting PyWren with IBM Cloud Functions do the data preparation phase. This makes an entire process up to 50 times faster compared to running the same code without using serverless computing.


Let’s say you write a function in Python to process and analyze some data. You successfully test the function using a small amount of data, and now you want to run the function as a serverless action at massive scale, with parallelism, against terabytes of data.

What options do you have? Obviously, you don’t want to learn cloud IT tricks and set up virtual machines. Nor do you necessarily want to become a serverless computing expert in scaling data inputs, processing outputs, and monitoring concurrent executions.

PyWren provides a solution. It lets you run your code against a large data set, get the results, and consider the value of insights gained. It greatly reduces the processing time by parallelization of the jobs in a simple manner.

In this code pattern, you’ll walk through an end-to-end workflow that covers data preprocessing with PyWren, then use the data to train AI models.


Speed up pre-processing

  1. Log in to IBM Watson Studio.
  2. Run the Jupyter Notebook in Watson Studio.
  3. Load the image data to a Cloud Object Storage bucket.
  4. Preprocess the images using PyWren and IBM Cloud Functions.
  5. Use Watson Machine Learning with TensorFlow and scikit-learn to create and train the model.


Find the detailed steps for this pattern in the readme file. The steps will show you how to:

  1. Set up a Cloud Object Storage instance.
  2. Create a Watson Machine Learning Service instance.
  3. Create an IBM Cloud Functions service.
  4. Create a Watson Studio project.
  5. Create a custom runtime environment.
  6. Create the notebook.
  7. Run the notebook.
  8. See the results.