IBM Watson™ Machine Learning Accelerator is a software solution that bundles IBM PowerAI, IBM Spectrum Conductor®, IBM Spectrum Conductor Deep Learning Impact, and support from IBM for the whole stack, including the open source deep learning frameworks. Watson Machine Learning Accelerator provides an end-to-end deep learning platform for data scientists. This includes complete lifecycle management from installation and configuration; data ingest and preparation; building, optimizing, and distributing the training model to moving the model into production. Watson Machine Learning Accelerator truly shines when you expand your deep learning environment to include multiple compute nodes. There’s even a free evaluation available. See the prerequisites from our introductory tutorial: Classify images with Watson Machine Learning Accelerator.

Snap ML

IBM has developed an efficient, scalable machine learning library that enables fast training of various machine learning models. Using this library, clients can remove training time as the bottleneck for machine learning workloads, paving the way to a range of new applications. The Snap Machine Learning (Snap ML) library combines recent advances in machine learning systems and algorithms and uses GPUs to accelerate generalized linear models. This was made possible by innovations in the algorithmic level, and also by the high-speed interconnection link between GPUs and POWER9™ CPUs: the NVLink 2.0.

The importance of this state-of-the-art library is amplified by the fact that logistic regression, decision trees, and random forests are the top three most used machine learning models at work by data scientists, (2017 Kaggle Data Science Survey), and all are supported by Snap ML today.

Snap ML (PowerAI 1.6.0) currently supports the following models.

Generalized linear models:

  • Logistic regression
  • Linear regression
  • Ridge regression
  • Lasso regression
  • Support vector machines (SVMs)

Tree-based models:

  • Decision trees
  • Random forest

Unique value proposition

There are three main features that distinguish the unique value proposition that Snap ML offers:

Distributed training — IBM has built the system as a data-parallel framework, enabling clients to scale out and train on massive data sets that exceed the memory capacity of a single machine, which is crucial for large-scale applications.

GPU acceleration — IBM has implemented specialized solvers designed to leverage the massively parallel architecture of GPUs while respecting the data locality in GPU memory to avoid large data transfer overhead. To make this approach scalable, IBM takes advantage of recent developments in heterogeneous learning to achieve GPU acceleration even if only a small fraction of the data can be stored in the accelerator memory.

Sparse data structures — Many machine learning data sets are sparse. Snap ML employs new optimizations for the algorithms when applied to sparse data structures.

All of this results in significantly reduced training times and the ability to handle terabyte-scale data sets.

Learning objectives

This is the third tutorial of the IBM Watson Machine Learning Accelerator education series. In our series, we have trained a logistic regression classifier to predict clicks on advertisements using a 20-GB data set that consists of online advertising click-through data, containing 45 million training examples and 1 million features. We will show you how to accelerate logistic regression model training with the Snap ML library, and compare the performance with open source Spark ML. This series consists of three parts:

  • Part 1 — Prepare the Criteo Kaggle data set

    • Downloading and extracting the data set
    • Creation of a train/test split using scikit-learn
  • Part 2 — Installation and configuration

    • Creation of two Spark instance groups
    • Installation and configuration of two Livy services on Watson Machine Learning Accelerator
  • Part 3 — Running logistic regression model

    • Customization of a notebook package to include sparkmagic
    • Connecting to a Watson Machine Learning Accelerator cluster from a notebook
    • Training a logistic regression model to predict customer click-through rate with Spark ML and with IBM Watson Machine Learning Accelerator Snap ML

Estimated time

The end-to-end tutorial takes about two hours and includes about 30 minutes of model training, plus installation and configuration as well as driving the model through the GUI.

Prerequisites

The tutorial requires access to a GPU-accelerated IBM Power Systems server model AC922 or S822LC. In addition to acquiring a server, there are multiple options to access Power Systems servers listed on the PowerAI Developer Portal.

Part 1: Prepare the Criteo Kaggle data set

  1. Download the Criteo Kaggle competition data.

     wget https://s3-us-west-2.amazonaws.com/criteo-public-svm-data/criteo.kaggle2014.svm.tar.gz
    
  2. Extract the contents.

     tar xzf criteo.kaggle2014.svm.tar.gz
    
  3. Execute the following Python script to create the training/test files.

     from sklearn.datasets import load_svmlight_file
     from sklearn.model_selection import train_test_split
     from sklearn.datasets import dump_svmlight_file
    
     X,y = load_svmlight_file("criteo.kaggle2014.train.svm")
    
     X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
    
     dump_svmlight_file(X_train, y_train, 'criteo.kaggle2014-train.libsvm', zero_based=False)
     dump_svmlight_file(X_test, y_test, 'criteo.kaggle2014-test.libsvm', zero_based=False)
    

As a result, two data sets are generated.

  • criteo.kaggle2014-train.libsvm
  • criteo.kaggle2014-test.libsvm

Part 2: Installation and configuration

We will create two separate Spark instance groups:

  1. Livy-integration-notebook for GPU workload
  2. Livy-integration-notebook-CPU for CPU workload

    Creation of Spark Instance Group #1 Creation of Spark Instance Group #1 – Livy-integration-notebook

  3. Enter the required fields and click Configuration.

    New Spark instance group

  4. Modify the following Spark properties.

  5. Set spark.default.parallelism to total number of available GPUs.

    Setting Spark properties Runtime environment Additional parameters

  6. Select Additional Parameters and add following parameters.

    Resource groups and plans

Configure resource groups and plans

New Spark instance group Creation of Spark Instance Group #2 – Livy-integration-notebook-cpu

  1. Enter the required fields and click Configuration.

    New Spark instance group configuration

  2. Modify the following Spark properties. Set spark.default.parallelism to the total number of available CPU cores.

    New Spark instance group configuration New Spark instance group configuration Add a parameter

  3. Select Additional Parameters and add following the parameters.

    Resource groups and plans Resource groups and plans

Install and configure two Livy services: SnapML-Livy & SnapML-Livy-CPU on Watson Machine Learning Accelerator.

As a result, you should have two Livy instances up and running and its output values (available on the Overview tab in the cluster management console) show the end-point location as livy_URL.

Application instances in all consumers Application instances in all consumers

Part 3: Training the logistic regression model

  1. Download the sample notebook and load into your favorite notebook environment. To access the Spark instance group by using the Apache Livy endpoint, you must load the client library, create a Livy session, and use it for the Spark job submission. The sparkmagic command helps to automate the process.

  2. Train the logistic regression model to predict the customer click-through rate and distribute training across multiple cores in CPU with Spark ML:

    a. Load the Sparkmagic extension: %load_ext sparkmagic.magics

    b. Create a Livy CPU session by using the livy_URL value from the application instance: %spark add -s cpu_session -l python -u -a u -k config`

    Running the cell List of Spark Instance groups

    c. After the Livy CPU session is created, you can launch the logistic regression model training to predict the customer click-through rate with Spark ML, running distributed across 33 CPU cores. This model has 1 million features and will train with a 20-GB Criteo Kaggle 2014 Test data set and run inference with 6-GB Criteo Kaggle 2014 Test data set. The execution completes in 202.96 seconds.

    Submitted applications Submitted applications continued

    d. Finally, the Livy session must be cleaned up to release the associated resource: %spark delete -s cpu_session

  3. Train a logistic regression model to predict customer click-through rate and distribute jobs in GPU with IBM Watson Machine Learning Accelerator Snap ML:

    a. Create a Livy GPU session by using the livy_URL value from the application instance. Create a Livy GPU session

    b. After the Livy CPU session is created, you can launch the logistic regression model training with Snap ML, running distributed across eight GPUs. Execution completes in 18 seconds.

Import data set Livy-integration-notebook

Conclusion

The Snap ML library offers GPU acceleration and distributed computing capabilities that accelerate machine learning model training and enable handling large data sets. In our tutorial, we trained a logistic regression classifier to predict clicks on online advertisements using a 20-GB data set that consists of online advertising click-through data, containing 45 million training examples and 1 million features. This is a highly relevant application for companies serving ads on their websites and online bidding companies, responsible for billion-dollar revenues in today’s connected society.

Snap ML speeds up this training workload tenfold by accelerating the execution time from 202 seconds (using Spark ML running on CPUs) to 18 seconds. This heavily improves productivity and might even enable such use cases as online retraining of machine learning models to adapt to rapidly changing situations or business requirements.

Want to know more? Take a look at this video to learn what Snap ML technology is and how you can benefit from it.