IBM Watson Machine Learning Accelerator is a software solution that bundles IBM PowerAI, IBM Spectrum Conductor, IBM Spectrum Conductor Deep Learning Impact, and support from IBM for the whole stack including the open source deep learning frameworks. Watson Machine Learning Accelerator provides an end-to-end, deep learning platform for data scientists. This includes the complete lifecycle management from installation and configuration; data ingest and preparation; building, optimizing, and distributing the training model; to moving the model into production. Watson Machine Learning Accelerator truly shines when you expand your deep learning environment to include multiple compute nodes. There’s even a free evaluation available, see the Prerequisites for more information.

In this tutorial, you will be performing a basic computer vision image classification example using the Deep Learning Impact function within Watson Machine Learning Accelerator. The example identifies whether the images contain clothes, dresses, clothes on a person, and dresses on a person. Of course, you can use whatever data you’d like in your example.

Learning objectives

After completing this tutorial, you’ll understand how to:

  • Get a feel for the deep learning workflow
  • Classify images with Watson Machine Learning Accelerator
  • Build a model using Watson Machine Learning Accelerator
  • Become more familiar with the IBM Power Systems server ecosystem

Estimated time

  • The end-to-end tutorial takes approx 3 hours, which includes about 50 minutes of model training, plus installation and configuration as well as driving model through the GUI.

Prerequisites

The tutorial requires access to a GPU-accelerated IBM Power Systems server model AC922 or S822LC. In addition to acquiring a server, there are multiple options to access Power Systems servers listed on the PowerAI Developer Portal.

Steps

Step 1. Download, install, and configure the IBM Watson Machine Learning Accelerator Evaluation

Step 2. Configure OS user

1.At the OS level, as root, on all nodes, create an OS group and user for the OS execution user: a.groupadd egoadmin b.useradd -g egoadmin -m egoadmin 2.The GID and UID of the created user / group MUST be the same on all nodes.

Step 3. Configure the resource groups

  1. Log on as the cluster Admin user
  2. Open up the Resource Group configuration:

alt

  1. Create a new resource group:

alt

  1. Call it “ImageRg”

alt

  1. The number of slots should use the advanced formula and equals the number of GPUs on the systems by using the keywork ngpus. Optionally, but recommended, change the resource selection method to static and select the nodes which are GPU-capable:

alt

  1. Under the “Members Host” column, click on “preferences” and select the attribute “ngpus” to be displayed:

alt alt

  1. Click on “Apply” and validate that the “Members Host” column now displays ngpus:

alt

  1. Finish the creation of the resource group by clicking on “Create”

Step 4. Create Spark Instance Group

  1. Select Workload -> Spark -> Spark Instance Groups

alt

  1. Click on New

alt

  1. Select Template

alt

  1. Select dli-sig-template-2-2-0

alt

5.Enter following three values:

alt

6.Clicked on Configuration and modify Spark parameters, including: a. set “”your host JAVA path” as the value for “JAVA_HOME” variable.

alt

b. set “86400” as the value the “SPARK_EGO_SLOTS_REQUIRED_TIMEOUT” variable.

alt

c. set “120” as the value the “SPARK_EGO_RECLAIM_GRACE_PERIOD” variable.

alt

  1. Scroll down and select the “ImageRg” resource group for the “Spark executors (GPU slots)” that you have created in previous screens. Do not change any other configuration there.

alt

 1.Click on Create and Deploy Instance group. 
 2.Click on Continue to Instance Group
 3.Watch as your instance group gets deployed

Step 5. Download the instrumented VGG-19 model for TensorFlow

Download all of the files in the https://git.ng.bluemix.net/ibmconductor-deep-learning-impact/dli-1.2.0-tensorflow-samples/tree/master/tensorflow-1.10/vgg19 directory.

Step 6. Download the pre-trained weights

Use the following code to download the pre-trained weights from TensorFlow. More information can be found in the GitHub repo.

mkdir <pretrained weight directory>
cd <pretrained weight directory>
wget http://download.tensorflow.org/models/vgg_19_2016_08_28.tar.gz
tar –zxvf vgg_19_2016_08_28.tar.gz

Step 7. Download the data sets

For this tutorial, we’re going to use a tool called googliser, which searches Google Images. It is a simple shell script with no prerequisites.

Use the following commands to run googliser and create four data sets in their own directories.

  • dresses_with_model
  • dresses_without_model
  • clothes_with_model
  • clothes_without_model

$ git clone https://github.com/teracow/googliser

$ cd googliser

$ ./googliser.sh --phrase "dresses with model" --title "dresses_with_model" --upper-size 200000 --lower-size 2000 --failures 0 -n 400 -N
 googliser.sh - 2018-07-26 PID:[43878]

 -> processing query: "dresses with model"
 -> searching Google:       10/10 result groups downloaded.      522 results!
 -> acquiring images:      400/400 downloaded and      115/     522 failed. (22%)

 -> All done!

$ ./googliser.sh --phrase "dresses only" --title "dresses_without_model" --upper-size 200000 --lower-size 2000 --failures 0 -n 400 -N
 googliser.sh - 2018-07-26 PID:[86968]

 -> processing query: "dresses only"
 -> searching Google:       10/10 result groups downloaded.      536 results!
 -> acquiring images:      400/400 downloaded and      122/     536 failed. (23%)

 -> All done!

$ ./googliser.sh --phrase "clothes with model" --title "clothes_with_model" --upper-size 200000 --lower-size 2000 --failures 0 -n 400 -N
 googliser.sh - 2018-07-26 PID:[14331]

 -> processing query: "clothes with model"
 -> searching Google:       10/10 result groups downloaded.      615 results!
 -> acquiring images:      400/400 downloaded and      194/     615 failed. (33%)

 -> All done!

$ ./googliser.sh --phrase "clothes only" --title "clothes_without_model" --upper-size 200000 --lower-size 2000 --failures 0 -n 400 -N
 googliser.sh - 2018-07-26 PID:[40210]

 -> processing query: "clothes only"
 -> searching Google:       10/10 result groups downloaded.      630 results!
 -> acquiring images:      400/400 downloaded and      112/     630 failed.  (34%)

 -> All done!

We’re now going to create a parent directory “train” first, then under the “train” parent directory we will create two sub- directories, images_without_model and images_with_model and move the images into those new directories.

mkdir images_with_model
mv dresses_with_model/* images_with_model
mv clothes_with_model/* images_with_model

mkdir images_without_model
mv dress_without_model/* images_without_model
mv clothes_without_model/* images_without_model

Step 8. Load data into Watson Machine Learning Accelerator

Associate the images with Watson Machine Learning Accelerator by creating a new data set.

alt

  1. In the Datasets tab, select New.

    alt

  2. Click Images for Object Classification. When presented with a dialog box, provide a unique name (for example, ‘CodePatternDS’) and select TFRecords for ‘Dataset stores images in’ and then select the folder that contains the images obtained in the previous step and give the values to the other fields as per the below screenshot. When you’re ready, click Create.

    alt alt

With your data in Watson Machine Learning Accelerator, you can begin the next step, building a model.

Step 9. Build the model

  1. Select the Models tab and click New.

    alt

  2. Select Add Location.

    alt

  3. Select TensorFlow as the Framework.

    alt

  4. Select TensorFlow-VGG19 for your new model, and click Next.

    alt

  5. Ensure that the Training engine is set to singlenode and that the data set points to the one you just created.

    alt

    Note: Set the Base learning rate to 0.001 because larger values might lead to exploding gradients.

    alt

The model is now ready to be trained.

Step 10. Run Training

  1. Back at the Models tab, select Train to view the models you can train, then select the model you created in the previous step.

    alt

  2. Use the pre-trained weight file you downloaded in the previous step by specifying the directory. Make sure that the files have a .ckpt extension. Click Start Training.

    alt

Step 11. Inspect the training run

  1. From the Train submenu of the Models tab, select the model that is training by clicking the link.

    alt

  2. Navigate from the Overview panel to the Training panel, and click the most recent link. You can watch as the results roll in.

    alt

Step 12. Create an inference model

From the Training view, click Create Inference Model.

alt

This creates a new model in the Models tab. You can view it by going to the Inference submenu.

alt

Step 13. Test it out

  1. Go back to the Models tab, select the new inference model, and click Test. At the new Testing overview screen, select New Test.

    alt

  2. Download inference test images into your local disk.

  3. Unzip Inference_images.zip and use the Browse option to load 6 images. Click Start Test.

    alt

  4. Wait for the test state to change from RUNNING to FINISHED.

    alt

  5. Click the link to view the results of the test.

    alt

As you can see, the images are available as a thumbnail preview along with their classified label and probability.

alt

Summary

We hope that you have enjoyed reading this tutorial. Happy hacking and good luck on creating your next model with Watson Machine Learning Accelerator.