We’re giving away 1,500 more DJI Tello drones. Enter to win ›
Kelvin Lui, Jim Van Oosten, Rodrigo Ceron | Published May 8, 2019
Artificial intelligenceData scienceDeep learningMachine learning
IBM Watson Machine Learning Accelerator is a software solution that bundles IBM PowerAI, IBM Spectrum Conductor, IBM Spectrum Conductor™ Deep Learning Impact, and support from IBM for the whole stack, including the open source deep learning frameworks. Watson Machine Learning Accelerator provides an end-to-end deep learning platform for data scientists. This includes complete lifecycle management from installation and configuration to data ingest and preparation to building, optimizing, and distributing the training model, and to and moving the model into production. Watson Machine Learning Accelerator excels when you expand your deep learning environment to include multiple compute nodes. There’s even a free evaluation available. See the prerequisites from our first introduction tutorial, Classify images with Watson Machine Learning Accelerator.
This article has been updated for Watson Machine Learning Accelerator v 1.2.x. It leverages the Anaconda and Notebook envionment creation function provided by IBM Spectrum Conductor.
This is the second tutorial of this IBM Watson Machine Learning Accelerator education series:
It should take you about two hours to complete this tutorial, which includes roughly 30 minutes of model training, installation, configuration, and getting the model through the GUI.
The tutorial requires access to a GPU-accelerated IBM Power® Systems server model AC922 or S822LC. In addition to acquiring a server, there are multiple options to access Power Systems servers listed on the IBM PowerAI developer portal.
Download The following file to your workstation. You can use wget or a browser download option for the URL.
Open the Spark Anaconda Management panel by using the Spectrum Conductor management console.
Add a new Anaconda.
Fill in the details for the Anaconda and click Add.
Distribution name is Anaconda3
Use browse to find and select the Anaconda installer downloaded in Task 1
Anaconda version: 2018.12
Python version: 3
Operating system: Linux on Power 64-bit little endian (LE)
Click Add to begin the Anaconda upload.
The upload time varies based your network speed.
After the Anaconda add is complete, you can deploy it and create an environment for it.
Deploy preparation: On all nodes, create a directory on the local disk space for an Anaconda deployment. In this example, the local disk space is /cwslocal, and the execution user we are going to use in the Spark Instance Group is demoexec. Your shared disk location and execution user might differ.
Now, select the distribution you just created, and click Deploy.
Fill in the required information.
In this example, the instance name follows a pattern of [Ananconda Name]-[Consumer]-[PowerAI]. The deployment directory matches the one that we created in the previous step. The consumer follows a pattern of [Ananconda Name]-[Consumer].
Execution user: demoexec
Click on the Environment Variables tab.
Add the variables for PATH and IBM_POWERAI_LICENSE_ACCEPT using the Add a Variable button.
| Name | Value |
| PATH | $PATH:/usr/bin |
| IBM_POWERAI_LICENSE_ACCEPT | yes |
Click Configure to complete the Anaconda deployment.
Click Deploy, and watch as your Anaconda environment gets deployed.
Create a powerai16.yml file on your workstation with the following content (notice the tabulation in the file). This is a YAML file that is used to create an Anaconda environment. If you do not have a YAML-enabled editor, consider verifing that the file format is valid by pasting the contents into an online YAML verification tool.
You might have additional conda and pip packages that you want installed. Those packages can be added to the dependencies and pip list in the file.
Select the Anaconda3 distribution that you created. Click Add to add a conda environment.
Create a new environment from the powerai16.yml file that you created, then click Add.
Use the Browse button to select the powerai16.yml file that you created, then click Add.
Watch the environment get created. It creates an environment with over 200 packages. If Add fails, check the logs and verify that the YAML file is formatted correctly. Retry the Add after the issue is resolved.
We use the IBM Spectrum Conductor-provided notebook. You can see it in Workload -> Spark -> Notebook Management.
Notice that there is a notebook called Jupyter, version 5.4.0. If you select it and click Configure, you can view the settings for this notebook:
The settings show properties such as:
In the next step, we show how to create a new Spark Instance Group that uses the notebook.
a. mkdir -p /cwsshare/demoexec/
b. chown -R demoexec:demoexec /cwsshare/demoexec/
Create a new SIG and include the added notebook. Go to “Workload -> Spark -> Spark Instance Groups”:
Click on “New”:
Scroll down and select the “GPUHosts” resource group for “Spark Executors (GPU slots)”. Do not change anything else.
Create on Create and Deploy Instance Group at the bottom of the page.
Watch as your instance group gets deployed.
After the deployment completes, start the SIG by clicking Start.
After the SIG is started, go to the Notebook tab a click Create Notebooks for Users.
Select the users for the nodebook server.
After the notebook has been created, refresh the screen to see My Notebooks. Clicking this shows the list of notebook servers created for this SIG.
Select the Jupyter 5.4.0 notebook to bring up the notebook server URL.
Sign on to the notebook server.
Download the tf_keras_fashion_mnist.ipynb notebook and upload it to the notebook server by clicking Upload. You have to press upload again after specifying the notebook to upload.
Select the notebook and begin executing the cells. The Keras model is defined in cell  and is trained in cell .
The test of the model shows an accuracy of more than 86 percent after being trained for five epochs.
This next section explains how to use the notebook to connect to a Hadoop data lake that has an Apache Livy service deployed. The following image shows the Hadoop integration.
Apache Livy is a service that enables easy interaction with a Spark cluster over a REST interface. It supports long-running Spark sessions and multi-tenancy. To install it on your Hadoop cluster, see your Hadoop vendor documentation like this one from Hortonworks. To get the Spark MLlib notebook to connect and run, make the following two changes on the Hortonworks HDP cluster.
yum -y install python-pip
pip install numpy
Sparkmagic runs in a Jupyter Notebook. It includes a set of tools for interactively working with remote Spark clusters through Livy. It is installed through pip and enabled in the notebook by running a Jupyter command.
Sign on to the notebook server and import the hadoop_livy2_spark_mllib_test.ipynb notebook provided by this tutorial and execute it.
You now have learned how to customize and install Anaconda and Notebook environments in Watson Machine Learning Accelerator. You also learned how to use the notebook server to run a notebook with a Keras model and how to run a notebook that connects to a Hadoop data lake and execute a Spark MLlib model.
An end-to-end tour using a computer vision classification example with Watson Machine Learning Accelerator.
Artificial intelligenceIBM PowerAI+
Get the Code »
Learn how to build and deploy a model using PowerAI Vision and then integrate it into an iOS application.
Artificial intelligenceData science+
Back to top