Reduce AI data preprocessing with PyWren and IBM Cloud Functions

Get the code

Summary

This code pattern uses a Jupyter Notebook running in Watson Studio to demonstrate how serverless computing can provide a great benefit for AI data preprocessing. The pattern demonstrates face recognition deep learning using the Watson Machine Learning service, while letting PyWren with IBM Cloud Functions do the data preparation phase. This makes an entire process up to 50 times faster compared to running the same code without using serverless computing.

Description

Let’s say you write a function in Python to process and analyze some data. You successfully test the function using a small amount of data, and now you want to run the function as a serverless action at massive scale, with parallelism, against terabytes of data.

What options do you have? Obviously, you don’t want to learn cloud IT tricks and set up virtual machines. Nor do you necessarily want to become a serverless computing expert in scaling data inputs, processing outputs, and monitoring concurrent executions.

PyWren provides a solution. It lets you run your code against a large data set, get the results, and consider the value of insights gained. It greatly reduces the processing time by parallelization of the jobs in a simple manner.

In this code pattern, you’ll walk through an end-to-end workflow that covers data preprocessing with PyWren, then use the data to train AI models.

Flow

Speed up pre-processing

  1. Log in to IBM Watson Studio.
  2. Run the Jupyter Notebook in Watson Studio.
  3. Load the image data to a Cloud Object Storage bucket.
  4. Preprocess the images using PyWren and IBM Cloud Functions.
  5. Use Watson Machine Learning with TensorFlow and scikit-learn to create and train the model.

Instructions

Find the detailed steps for this pattern in the readme file. The steps will show you how to:

  1. Set up a Cloud Object Storage instance.
  2. Create a Watson Machine Learning Service instance.
  3. Create an IBM Cloud Functions service.
  4. Create a Watson Studio project.
  5. Create a custom runtime environment.
  6. Create the notebook.
  7. Run the notebook.
  8. See the results.