Kelvin Lui, Haris Pozidis, Thomas Parnell, Michael Feiman | Published April 3, 2019
Artificial intelligenceDeep learningMachine learning
IBM Watson™ Machine Learning Accelerator is a software solution that bundles IBM PowerAI, IBM Spectrum Conductor®, IBM Spectrum Conductor Deep Learning Impact, and support from IBM for the whole stack, including the open source deep learning frameworks. Watson Machine Learning Accelerator provides an end-to-end deep learning platform for data scientists. This includes complete lifecycle management from installation and configuration; data ingest and preparation; building, optimizing, and distributing the training model to moving the model into production. Watson Machine Learning Accelerator truly shines when you expand your deep learning environment to include multiple compute nodes. There’s even a free evaluation available. See the prerequisites from our introductory tutorial: Classify images with Watson Machine Learning Accelerator.
IBM has developed an efficient, scalable machine learning library that enables fast training of various machine learning models. Using this library, clients can remove training time as the bottleneck for machine learning workloads, paving the way to a range of new applications. The Snap Machine Learning (Snap ML) library combines recent advances in machine learning systems and algorithms and uses GPUs to accelerate generalized linear models. This was made possible by innovations in the algorithmic level, and also by the high-speed interconnection link between GPUs and POWER9™ CPUs: the NVLink 2.0.
The importance of this state-of-the-art library is amplified by the fact that logistic regression, decision trees, and random forests are the top three most used machine learning models at work by data scientists, (2017 Kaggle Data Science Survey), and all are supported by Snap ML today.
Snap ML (PowerAI 1.6.0) currently supports the following models.
Generalized linear models:
There are three main features that distinguish the unique value proposition that Snap ML offers:
Distributed training — IBM has built the system as a data-parallel framework, enabling clients to scale out and train on massive data sets that exceed the memory capacity of a single machine, which is crucial for large-scale applications.
GPU acceleration — IBM has implemented specialized solvers designed to leverage the massively parallel architecture of GPUs while respecting the data locality in GPU memory to avoid large data transfer overhead. To make this approach scalable, IBM takes advantage of recent developments in heterogeneous learning to achieve GPU acceleration even if only a small fraction of the data can be stored in the accelerator memory.
Sparse data structures — Many machine learning data sets are sparse. Snap ML employs new optimizations for the algorithms when applied to sparse data structures.
All of this results in significantly reduced training times and the ability to handle terabyte-scale data sets.
This is the third tutorial of the IBM Watson Machine Learning Accelerator education series. In our series, we have trained a logistic regression classifier to predict clicks on advertisements using a 20-GB data set that consists of online advertising click-through data, containing 45 million training examples and 1 million features. We will show you how to accelerate logistic regression model training with the Snap ML library, and compare the performance with open source Spark ML. This series consists of three parts:
Part 1 — Prepare the Criteo Kaggle data set
Part 2 — Installation and configuration
Part 3 — Running logistic regression model
The end-to-end tutorial takes about two hours and includes about 30 minutes of model training, plus installation and configuration as well as driving the model through the GUI.
The tutorial requires access to a GPU-accelerated IBM Power Systems server model AC922 or S822LC. In addition to acquiring a server, there are multiple options to access Power Systems servers listed on the PowerAI Developer Portal.
Download the Criteo Kaggle competition data.
Extract the contents.
tar xzf criteo.kaggle2014.svm.tar.gz
Execute the following Python script to create the training/test files.
from sklearn.datasets import load_svmlight_file
from sklearn.model_selection import train_test_split
from sklearn.datasets import dump_svmlight_file
X,y = load_svmlight_file("criteo.kaggle2014.train.svm")
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
dump_svmlight_file(X_train, y_train, 'criteo.kaggle2014-train.libsvm', zero_based=False)
dump_svmlight_file(X_test, y_test, 'criteo.kaggle2014-test.libsvm', zero_based=False)
As a result, two data sets are generated.
We will create two separate Spark instance groups:
Livy-integration-notebook-CPU for CPU workload
Creation of Spark Instance Group #1 – Livy-integration-notebook
Enter the required fields and click Configuration.
Modify the following Spark properties.
Set spark.default.parallelism to total number of available GPUs.
Select Additional Parameters and add following parameters.
Creation of Spark Instance Group #2 – Livy-integration-notebook-cpu
Modify the following Spark properties. Set spark.default.parallelism to the total number of available CPU cores.
Select Additional Parameters and add following the parameters.
Install and configure two Livy services: SnapML-Livy & SnapML-Livy-CPU on Watson Machine Learning Accelerator.
As a result, you should have two Livy instances up and running and its output values (available on the Overview tab in the cluster management console) show the end-point location as livy_URL.
Download the sample notebook and load into your favorite notebook environment. To access the Spark instance group by using the Apache Livy endpoint, you must load the client library, create a Livy session, and use it for the Spark job submission. The sparkmagic command helps to automate the process.
Train the logistic regression model to predict the customer click-through rate and distribute training across multiple cores in CPU with Spark ML:
a. Load the Sparkmagic extension: %load_ext sparkmagic.magics
b. Create a Livy CPU session by using the livy_URL value from the application instance: %spark add -s cpu_session -l python -u -a u -k config`
c. After the Livy CPU session is created, you can launch the logistic regression model training to predict the customer click-through rate with Spark ML, running distributed across 33 CPU cores. This model has 1 million features and will train with a 20-GB Criteo Kaggle 2014 Test data set and run inference with 6-GB Criteo Kaggle 2014 Test data set. The execution completes in 202.96 seconds.
d. Finally, the Livy session must be cleaned up to release the associated resource: %spark delete -s cpu_session
%spark delete -s cpu_session
Train a logistic regression model to predict customer click-through rate and distribute jobs in GPU with IBM Watson Machine Learning Accelerator Snap ML:
a. Create a Livy GPU session by using the livy_URL value from the application instance.
b. After the Livy CPU session is created, you can launch the logistic regression model training with Snap ML, running distributed across eight GPUs. Execution completes in 18 seconds.
The Snap ML library offers GPU acceleration and distributed computing capabilities that accelerate machine learning model training and enable handling large data sets. In our tutorial, we trained a logistic regression classifier to predict clicks on online advertisements using a 20-GB data set that consists of online advertising click-through data, containing 45 million training examples and 1 million features. This is a highly relevant application for companies serving ads on their websites and online bidding companies, responsible for billion-dollar revenues in today’s connected society.
Snap ML speeds up this training workload tenfold by accelerating the execution time from 202 seconds (using Spark ML running on CPUs) to 18 seconds. This heavily improves productivity and might even enable such use cases as online retraining of machine learning models to adapt to rapidly changing situations or business requirements.
Want to know more? Take a look at this video to learn what Snap ML technology is and how you can benefit from it.
Customize a notebook package to include Anaconda, Watson PowerAI, and sparkmagic and use that to run a Keras model connect…
Apache SparkArtificial intelligence+
An end-to-end tour using a computer vision classification example with Watson Machine Learning Accelerator.
Artificial intelligenceIBM PowerAI+
Back to top