page-brochureware.php

IBM PowerAI developer portal

Learn about deep learning and PowerAI. Create something amazing.

IBM PowerAI Releases

PowerAI Release 5.2
Announced 06/19/2018. Available now.

What’s new in R5.2

  • Python 3 support for the framework packages (in addition to the existing Python 2 support; not including Caffe).
  • A Technology Preview of IBM PowerAI Snap Machine Learning (Snap ML). Snap ML provides classical machine learning functionalities exposed via a sklearn-like interface.
  • A Technology Preview of PyTorch – a Python library that enables GPU-accelerated tensor computation and provides a rich API for neural network applications.
  • A Technology Preview of Large Model Support (LMS) is introduced for TensorFlow and enhanced for IBM Caffe. Large Module Support provides an approach to training large models and batch sizes that cannot fit in GPU memory.

Note that a NCCL v1.3.5 package is still included in the PowerAI distribution but is not installed by default. The other PowerAI components are now built against NCCL v2.2.12, which must be downloaded from NVIDIA. The NCCL 1 package is provided for compatibility with existing applications, but may be removed in future releases of PowerAI.

Ordering information

PowerAI R5.2 is available as a no charge orderable part number from IBM. To place an order for PowerAI 1.5.2 please contact your IBM representative or authorized Business Partner.

To be contacted by an IBM sales representative please either complete this form: https://www.ibm.com/connect/ibm/us-en/

Or you may call a telesales associate within your region through the telephone numbers listed on https://www.ibm.com/contact.

Beginning with PowerAI 1.5.2, software is also available as a Docker container at https://hub.docker.com/r/ibmcom/powerai/

Software packages and pre-requisites

PowerAI 1.5.2 provides software packages for several Deep Learning frameworks, supporting libraries, and tools including:

Component Version
DDL 1.0.0
TensorFlow 1.8.0
TensorBoard 1.8.0
IBM Caffe 1.0.0
BVLC Caffe 1.0.0
PyTorch 0.4.0
Snap ML 1.0.0
Spectrum MPI 10.2
Bazel 0.10.0
OpenBLAS 0.2.20
HDF5 1.10.1
Protobuf 3.4.0

PowerAI is optimized to leverage the unique capabilities of IBM Power Systems accelerated servers, and is not available on any other platforms. It is supported on:

  • IBM Power System AC922 with NVIDIA Tesla V100 GPUs
  • IBM Power System S822LC with NVIDIA Tesla P100 GPUs

PowerAI requires some additional 3rd party software components. See the table below for more information:

Component Version Recommended
Red Hat Enterprise Linux (RHEL) 7.5 7.5
NVIDIA CUDA 9.2 9.2.88
NVIDIA GPU driver 396 396.26
NVIDIA cuDNN 7.1 7.1.4
NVIDIA NCCL 2.2 2.2.12
Anaconda 5.1 5.1.0

Additional information

System Setup

Operating System

The Deep Learning packages require RHEL 7.5 little endian for IBM POWER8 and IBM POWER9. The RHEL install image and license must be acquired from Red Hat: https://www.redhat.com/en/technologies/linux-platforms/enterprise-linux

Operating System and Repository Setup

  1. Enable ‘optional’ and ‘extra’ repo channels
      IBM POWER8:
          $ sudo subscription-manager repos --enable=rhel-7-for-power-le-optional-rpms
          $ sudo subscription-manager repos --enable=rhel-7-for-power-le-extras-rpms
    
      IBM POWER9:
          $ sudo subscription-manager repos --enable=rhel-7-for-power-9-optional-rpms
          $ sudo subscription-manager repos --enable=rhel-7-for-power-9-extras-rpms
  2. Install packages needed for the installation
      $ sudo yum -y install wget nano bzip2
  3. Enable EPEL repo
       $ wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
       $ sudo rpm -ihv epel-release-latest-7.noarch.rpm
  4. Load the latest kernel
      $ sudo yum update kernel kernel-tools kernel-tools-libs kernel-bootwrapper
      $ reboot

    Or do a full update

      $ sudo yum update
      $ sudo reboot

NVIDIA Components

IBM POWER9 specific udev rules

Before installing the NVIDIA components the udev Memory Auto-Onlining Rule must be disabled for the CUDA driver to function properly. To disable it:

  1. Copy the /lib/udev/rules.d/40-redhat.rules file to the directory for user overridden rules.
      $ sudo cp /lib/udev/rules.d/40-redhat.rules /etc/udev/rules.d/
  2. Edit the /etc/udev/rules.d/40-redhat.rules file.
      $ sudo nano /etc/udev/rules.d/40-redhat.rules
  3. Comment out the following line and save the change:
    SUBSYSTEM=="memory", ACTION=="add", PROGRAM="/bin/uname -p", 
    RESULT!="s390*", ATTR{state}=="offline", ATTR{state}="online"
  4. Optionally delete the first line of the file, since the file was copied to a directory where it won’t be overwritten.
      # do not edit this file, it will be overwritten on update
  5. Reboot the system for the changes to take effect.
      $ sudo reboot

CUDA, GPU driver, cuDNN and NCCL

The Deep Learning packages require CUDA, cuDNN, and GPU driver packages from NVIDIA. See the table above for the required and recommended versions of these components.

Install the components by following these steps:

  1. Download and install NVIDIA CUDA 9.2 from https://developer.nvidia.com/cuda-downloads
    • Select Operating System: Linux
    • Select Architecture: ppc64le
    • Select Distribution RHEL
    • Select Version 7
    • Select Installer Type rpm (local)
    • Follow the Linux POWER installation instructions in the CUDA Quick Start Guide, including the steps describing how to set up the CUDA development environment by updating PATH and LD_LIBRARY_PATH.

    Note: The local rpm is preferred over the network rpm as it will ensure the version installed is the version downloaded. With the network rpm, "yum install cuda" will always install the latest version of the CUDA Toolkit.

  2. Download NVIDIA cuDNN v7.1.4 for CUDA 9.2 from https://developer.nvidia.com/cudnn (Registration in NVIDIA’s Accelerated Computing Developer Program is required)
    • cuDNN v7.1.4 Library for Linux (Power8/Power9)
  3. Download NVIDIA NCCL v2.2.12 for CUDA 9.2 from https://developer.nvidia.com/nccl (Registration in NVIDIA’s Accelerated Computing Developer Program is required)
    • NCCL 2.2.12 O/S agnostic and CUDA 9.2 and IBM Power
  4. Install the cuDNN v7.1.4 and NCCL v2.2.12 packages. Refresh shared library cache.
     $ sudo tar -C /usr/local --no-same-owner -xzvf cudnn-9.2-linux-ppc64le-v7.1.tgz
     $ sudo tar -C /usr/local --no-same-owner -xzvf nccl_2.2.12-1+cuda9.2_ppc64le.tgz
     $ sudo ldconfig
    

Anaconda

A number of the Deep Learning frameworks require Anaconda. Anaconda is a platform-agnostic data science distribution with a collection of 1,000+ open source packages with free community support.

Anaconda2 with Python 2 should be used to run the Python 2 versions of the Deep Learning frameworks. Anaconda3 with Python 3 is required to run the Python 3 versions of the Deep Learning frameworks.

Anaconda Version Download Location Size md5sum
Anaconda2 5.1.0 https://repo.continuum.io/archive/Anaconda2-5.1.0-Linux-ppc64le.sh 267M e894dcc547a1c7d67deb04f6bba7223a
Anaconda3 5.1.0 https://repo.continuum.io/archive/Anaconda3-5.1.0-Linux-ppc64le.sh 286M 47b5b2b17b7dbac0d4d0f0a4653f5b1c

Download and Install Anaconda. Installation requires input for license agreement, install location (default is $HOME/anaconda2) and permission to modify the PATH environment variable (via .bashrc).

   $ wget https://repo.continuum.io/archive/Anaconda2-5.1.0-Linux-ppc64le.sh
   $ bash Anaconda2-5.1.0-Linux-ppc64le.sh
   $ source ~/.bashrc

If multiple users are using the same system, each user should install Anaconda individually.

Installing the Deep Learning Frameworks

Software Repository Setup

The PowerAI Deep Learning packages are distributed in a tar.gz file containing an rpm and this README file. The tar.gz file must be extracted on the local machine. Installing the rpm creates an installation repository on the local machine.

Install the repository package:


       $ sudo rpm -ihv mldl-repo-*.rpm

Installing all frameworks at once

All the Deep Learning frameworks can be installed at once using the power-mldl meta-package:

    $ sudo yum install power-mldl

Note: The above step does not include installing PowerAI Distributed Deep Learning(DDL) packages. See details of how to install DDL below.

Installing the Python 3 versions of the frameworks

The Python 3 versions of the frameworks can be installed at once using the power-mldl-py3 meta-package

$ sudo yum install power-mldl-py3

Installing frameworks individually

The Deep Learning frameworks can be installed individually if preferred. The framework packages are:

  • caffe-bvlc – Berkeley Vision and Learning Center (BVLC) upstream Caffe, v1.0.0
  • caffe-ibm – IBM Optimized version of BVLC Caffe, v1.0.0
  • pytorch – PyTorch, v0.4.0
  • tensorflow – TensorFlow, v1.8.0
  • tensorboard – Web Applications for inspecting TensorFlow runs and graphs, v1.8.0

The Python 3 version of each framework appends ‘-py3’ to the package name

  • pytorch-py3 – PyTorch, v0.4.0
  • tensorflow-py3 – TensorFlow, v1.8.0
  • tensorboard-py3 – Web Applications for inspecting TensorFlow runs and graphs, v1.8.0

Each can be installed with:

    $ sudo yum install <framework>

Install IBM PowerAI Distributed Deep Learning (DDL) packages

We recommend PowerAI Distributed Deep Learning for distributing model training across a cluster of Power machines. DDL includes IBM Spectrum MPI for communication among machines.

Install the PowerAI Distributed Deep Learning packages using:

    $ sudo yum install power-ddl

Note: DDL is an optional component. Other PowerAI components can be installed and used without installing DDL.

To use InfiniBand for DDL communications, install the latest Mellanox OFED driver. See the Download tab at: http://www.mellanox.com/page/products_dyn?product_family=26

Install IBM PowerAI Snap ML packages

Install the PowerAI Snap ML packages using:

    $ sudo yum install power-snapml

Note: Snap ML is an optional component. Other PowerAI components can be installed and used without installing Snap ML.

Accept the PowerAI License Agreement

Read the license agreement and accept the terms and conditions before using any of the frameworks.

    $ sudo /opt/DL/license/bin/accept-powerai-license.sh

After reading the license agreement, future installs may be automated to silently accept the license agreement.

    $ sudo IBM_POWERAI_LICENSE_ACCEPT=yes /opt/DL/license/bin/accept-powerai-license.sh

Upgrading from PowerAI R5.1

PowerAI 1.5.1 should be uninstalled prior to installing PowerAI 1.5.2.

Upgrading from PowerAI R5.0

PowerAI 1.5.2 requires newer versions of NVIDIA CUDA, NVIDIA cuDNN, the GPU driver, and IBM Spectrum MPI than 1.5.0. To upgrade, the older versions should be uninstalled and the newer versions installed. Likewise, the PowerAI 1.5.0 software packages should be uninstalled and the PowerAI 1.5.2 packages installed.

Upgrading from PowerAI R5.0 Caffe

The Caffe packages in PowerAI 1.5.0 used the HDF5 library from Anaconda. That library is now packaged with PowerAI so the Anaconda copy is no longer needed. After upgrading to 1.5.2, it is safe to remove the library symlinks from the cache directory:

$ ls -l ~/.powerai/caffe-bvlc/
$ rm -r ~/.powerai/caffe-bvlc

$ ls -l ~/.powerai/caffe-ibm/
$ rm -r ~/.powerai/caffe-ibm

Tuning Recommendations

Recommended settings for optimal Deep Learning performance on the S822LC and AC922 for High Performance Computing are:

  • Enable Performance Governor
      $ sudo yum install kernel-tools
      $ sudo cpupower -c all frequency-set -g performance
    
  • Enable GPU persistence mode

      $ sudo systemctl enable nvidia-persistenced
      $ sudo systemctl start nvidia-persistenced
    
  • Set GPU memory and graphics clocks
    • S822LC with NVIDIA Tesla P100, set clocks to maximum
      $ sudo nvidia-smi -ac 715,1480
      
    • AC922 with NVIDIA Tesla V100, set clocks to NVIDIA defaults
      $ sudo nvidia-smi -rac
      
  • For TensorFlow, set the SMT mode
    • S822LC with NVIDIA Tesla P100, set SMT=2
      $ sudo ppc64_cpu --smt=2
      
    • AC922 with NVIDIA Tesla V100, set SMT based on DDL usage:
      $ sudo ppc64_cpu --smt=4    # for TensorFlow WITHOUT DDL
      
      $ sudo ppc64_cpu --smt=2    # for TensorFlow WITH DDL
      

Getting Started with MLDL Frameworks

General Setup

Most of the PowerAI packages install outside the normal system search paths (to /opt/DL/...), so each framework package provides a shell script to simplify environmental setup (e.g. PATH, LD_LIBRARY_PATH, PYTHONPATH).

We recommend users update their shell rc file (e.g. .bashrc) to source the desired setup scripts. For example:

$ source /opt/DL/<framework>/bin/<framework>-activate

Each framework also provides a test script to verify some of its functions. These test scripts include tests and examples sourced from the various communities. Note that some of the included tests rely on datasets (ex. MNIST) that are available in the community and are downloaded at runtime. Access and availability to this data is subject to the community and may change at any time.

To run the test script for a particular framework, run:

$ <framework>-test

Note about dependencies

A number of the PowerAI frameworks (for example, TensorFlow, TensorBoard, and PyTorch) have their dependencies satisfied via Anaconda packages. These dependencies are validated by the <framework>-activate script to ensure they are installed and, if not, the script will fail.

For these frameworks, the /opt/DL/<framework>/bin/install_dependencies script must be run prior to activation to install the required packages.

For example:

$ source /opt/DL/tensorflow/bin/tensorflow-activate
Missing dependencies ['backports.weakref', 'mock', 'protobuf']
Run "/opt/DL/tensorflow/bin/install_dependencies" to resolve this problem.

$ /opt/DL/tensorflow/bin/install_dependencies
Fetching package metadata ...........
Solving package specifications: .

Package plan for installation in environment /home/rhel/anaconda2:

The following NEW packages will be INSTALLED:

    backports.weakref: 1.0rc1-py27_0
    libprotobuf:       3.4.0-hd26fab5_0
    mock:              2.0.0-py27_0
    pbr:               1.10.0-py27_0
    protobuf:          3.4.0-py27h7448ec6_0

Proceed ([y]/n)? y

libprotobuf-3. 100% |###############################| Time: 0:00:02   2.04 MB/s
backports.weak 100% |###############################| Time: 0:00:00  12.83 MB/s
protobuf-3.4.0 100% |###############################| Time: 0:00:00   2.20 MB/s
pbr-1.10.0-py2 100% |###############################| Time: 0:00:00   3.35 MB/s
mock-2.0.0-py2 100% |###############################| Time: 0:00:00   3.26 MB/s

$ source /opt/DL/tensorflow/bin/tensorflow-activate
$

Note: PyTorch and TensorFlow have conflicting Anaconda package dependencies. Create separate Anaconda environments for those frameworks.

Getting Started with DDL

The Caffe and TensorFlow sections below describe how to use the DDL support for each of those frameworks.

Some configuration steps are common to all use of DDL:

  • PowerAI frameworks must be installed at the same version on all nodes in the DDL cluster.
  • The DDL master node must be able to log into all the nodes in the cluster using ssh keys. Keys can be created and added by:
    1. Generate ssh private/public key pair on the master node using:
       $ ssh-keygen
      
    2. Copy the generated public key in ~/.ssh/id_rsa.pub to all the nodes’ ~./ssh/authorized_keys file:
       $ ssh-copy-id -i ~/.ssh/id_rsa.pub $USER@$HOST
      
  • Linux system firewalls may need to be adjusted to pass MPI traffic. This could be done broadly as shown. Note: Opening only required ports would be more secure. Required ports will vary with configuration.
     $ sudo iptables -A INPUT -p tcp --dport 1024:65535 -j ACCEPT
    

Getting Started with Caffe

Caffe Alternatives

Packages are provided for upstream BVLC Caffe (/opt/DL/caffe-bvlc) and IBM optimized Caffe (/opt/DL/caffe-ibm). The system default Caffe (/opt/DL/caffe) can be selected using the operating system’s alternatives system:

    $ sudo update-alternatives --config caffe
    There are 2 programs which provide 'caffe'.

      Selection    Command
    -----------------------------------------------
       1           /opt/DL/caffe-bvlc
    *+ 2           /opt/DL/caffe-ibm

    Enter to keep the current selection[+], or type selection number:

Users can activate the system default caffe:

    source /opt/DL/caffe/bin/caffe-activate

Or they can activate a specific variant. For example:

    source /opt/DL/caffe-bvlc/bin/caffe-activate

Attempting to activate multiple Caffe packages in a single login session will cause unpredictable behavior.

Caffe Samples and Examples

Each Caffe package includes example scripts and sample models, etc. A script is provided to copy the sample content into a specified directory:

    $ caffe-install-samples <somedir>

More Info

Visit Caffe’s website: http://caffe.berkeleyvision.org/ for tutorials and example programs that you can run to get started.

Here are links to a couple of the example programs:

Optimizations in IBM Caffe

The IBM Caffe package (caffe-ibm) in PowerAI is based on BVLC Caffe and includes optimizations and enhancements from IBM:

Note: DDL is to be installed separately as mentioned above.

Command Line Options

IBM Caffe supports all of BVLC Caffe’s options and adds a few new ones to control the enhancements. IBM Caffe options related to Distributed Deep Learning (options that start with the word “ddl”) will work only if you have DDL installed.

  • -bvlc: Disable CPU/GPU layer-wise reduction

  • -threshold: Tune CPU/GPU layer-wise reduction. If the number of parameters for one layer is greater than or equal to threshold, their accumulation on CPU will be done in parallel. Otherwise, the accumulation will be done using one thread. It is set to 2,000,000 by default.
  • -ddl ["-option1 param -option2 param"]: Enable Distributed Deep Learning, with optional space-delimited parameter string. Supported parameters are:
    • mode <mode>
    • dump_iter <N>
    • dev_sync <0, 1, or 2>
    • rebind_iter <N>
    • dbg_level <0, 1, or 2>
  • -ddl_update: This option instructs Caffe to use a new custom version of the ApplyUpdate function that is optimized for DDL. It is faster, but does not support gradient clipping so is off by default. It can be used in networks that do not support clipping (common).
  • -ddl_align: This option ensures that the gradient buffers have a length that is a multiple of 256 bytes and have start addresses that are multiples of 256. This ensures cache line alignment on multiple platforms as well as alignment with NCCL slices. Off by default
  • -ddl_database_restart: This option ensures every learner always looks at the same data set during an epoch. This allows a system to cache only the pages that are touched by the learners contained within it. It can help size the number of learners needed for a given data set size by establishing a known database footprint per system. This flag should not be used while running caffe on several hosts. Off by default.
  • -lms: Enable Large Model Support. See below.
  • -lms_size_threshold <size in KB>: Set LMS size threshold. See below.
  • -lms_exclude <size in MB>: Tune LMS memory utilization. See below.
  • -affinity: Enable CPU/GPU affinity (default). Specify -noaffinity to disable.

Use the command line options as follows:

    | Feature                         | -bvlc | -ddl | -lms  | -gpu          | -affinity |
    | ------------------------------- | ----- | ---- | ----- | ------------- | --------- |
    | CPU/GPU layer-wise reduction    |   N   |   X  |   X   | multiple GPUs | X         |
    | Distributed Deep Learning (DDL) |   X   |   Y  |   X   | N             | X         |
    | Large model support             |   X   |   X  |   Y   | X             | X         |
    | CPU/GPU affinity                |   X   |   X  |   X   | X             | Y         |

    Y: do specify
    N: don't specifiy
    X: don't care/matter

LMS gets enabled regardless of other options as long as -lms is specified. For example, you can use DDL and LMS together.

CPU/GPU layer-wise reduction is enabled only if multiple GPUs are specified and layer_wise_reduce: false.

Use of multiple GPUs with DDL is specified via the MPI rank file, so the -gpu flag may not be used to specify multiple GPUs for DDL.

While running caffe on several hosts, the use of shared storage for data can lead caffe to hang.

About CPU/GPU Layer-wise Reduction

This optimization aims to reduce the running time of a multiple-GPU training by utilizing CPUs. In particular, gradient accumulation is offloaded to CPUs and done in parallel with the training. To gain the best performance with IBM Caffe, please close unnecessary applications that consume a high percentage of CPU.

If using a single GPU, IBM Caffe and BVLC Caffe will have similar performance.

The optimizations in IBM Caffe do not change the convergence of a neural network during training. IBM Caffe and BVLC Caffe should produce the same convergence results.

CPU/GPU layer-wise reduction is enabled unless the -bvlc commandline flag is used.

About IBM PowerAI Distributed Deep Learning (DDL)

See /opt/DL/ddl/doc/README.md for more information about using IBM PowerAI Distributed Deep Learning.

About Large Model Support (LMS)

IBM Caffe with Large Model Support loads the neural model and data set in system memory and caches activity to GPU memory only when needed for computation. This allows models and training batch size to scale significantly beyond what was previously possible. You can enable Large Model Support by adding -lms. Large Model Support is available as a technology preview.

The -lms_size_threshold <size in KB> option modifies the minimum memory chunk size considered for the LMS cache (default: 1000). Any chunk smaller than this value will be exempt from LMS reuse and will persist in GPU memory. The value can be used to control the performance trade-off.

The -lms_exclude <size in MB> option defines a soft limit on GPU memory allocated for the LMS cache (where limit = GPU-capacityvalue). If zero, favors aggressive GPU memory reuse over allocation (default). If specified (> 0), enables aggressive allocation of GPU memory up to the limit. Minimizing this value — while still allowing enough memory for non-LMS allocations — may improve performance by increasing GPU memory utilization and reducing data transfers between system and GPU memory.

For example, the following command line options yield the best training performance for the GoogleNet model with high-resolution image data (crop size 2240×2240, batch size 5) using Tesla P100 GPUs:

    $ caffe train -solver=solver.prototxt -gpu all -lms —lms_size_threshold 1000 -lms_exclude 1400

Note that ideal tunings for any given scenario may differ depending on the model’s network architecture, data size, batch size and GPU memory capacity.

Combining LMS and DDL

Large Model Support and Distributed Deep Learning can be combined. For example, to run on two hosts named host1 and host2:

    $ ddlrun -H host1,host2 caffe train -solver solver-resnet-152.prototxt -lms

Getting Started with Tensorflow

The TensorFlow homepage (https://www.tensorflow.org/) has a variety of information, including Tutorials, How Tos, and a Getting Started guide.

Additional tutorials and examples are available from the community, for example:

High-Performance Models

A version of TensorFlow High-Performance Models which includes options to use Distributed Deep Learning is included in the tensorflow-performance-models package. For more information, see:

  • /opt/DL/tensorflow-performance-models/scripts/tf_cnn_benchmarks/README.md

Large Model Support (LMS)

This release of PowerAI includes a Technology Preview of large model support for TensorFlow. Large Model Support provides an approach to training large models and batch sizes that cannot fit in GPU memory. It does this by use of a graph editing library that takes the user model’s computational graph and automatically adds swap-in and swap-out nodes for transferring tensors from GPU memory to system memory and vice versa during training.

For more information about TensorFlow LMS, see:

  • /opt/DL/tensorflow/doc/README-LMS.md

Distributed Deep Learning (DDL) Custom Operator for TensorFlow

The DDL custom operator uses IBM Spectrum MPI and NCCL to provide high-speed communications for distributed TensorFlow.

The DDL custom operator can be found in the ddl-tensorflow package. For more information about DDL and about the TensorFlow operator, see:

  • /opt/DL/ddl/doc/README.md
  • /opt/DL/ddl-tensorflow/doc/README.md
  • /opt/DL/ddl-tensorflow/doc/README-API.md

Additional TensorFlow Features

The PowerAI TensorFlow packages include TensorBoard. See: https://www.tensorflow.org/get_started/summaries_and_tensorboard

The TensorFlow 1.8.0 package includes support for additional features:

TensorBoard Usage Notes

Additional usage notes are available from the community. See notes at:

Getting started with Snap Machine Learning (Snap ML)

This release of PowerAI includes Technology preview of Snap Machine Learning (Snap ML). Snap ML is a library for training generalized linear models. It is being developed at IBM with the vision to remove training time as a bottleneck for machine learning applications. Snap ML supports a large number of classical machine learning models and scales gracefully to data sets with billions of examples and/or features. It offers distributed training, GPU acceleration and supports sparse data structures.

“With Snap ML you can train your machine learning model faster than you can snap your fingers!”

The Snap ML library offers two different packages:

snap-ml-local

snap-ml-local is used for machine learning on a single machine.

For information on snap-ml-local, see /opt/DL/snap-ml-local/doc/README.md

snap-ml-mpi

snap-ml-mpi is used for distributed training of machine learning models across a cluster of machines.

For information on snap-ml-mpi, see /opt/DL/snap-ml-mpi/doc/README.md

Getting started with PyTorch

This release of PowerAI includes a Technology Preview of PyTorch – deep learning framework for fast, flexible experimentation.

PyTorch Examples

The PyTorch package includes a set of examples. A script is provided to copy the sample content into a specified directory:

    $ pytorch-install-samples <somedir>

More Info

The PyTorch homepage (https://pytorch.org) has a variety of information, including Tutorials and a Getting Started guide.

Additional tutorials and examples are available from the community, for example:

Uninstalling MLDL Frameworks

The MLDL framework packages can be uninstalled individually if desired. Or to uninstall all MLDL packages and the repository package at once:

    $ sudo yum remove powerai-license
    $ sudo yum remove mldl-repo-local
    $ sudo yum autoremove

© Copyright IBM Corporation 2017, 2018

IBM, the IBM logo, ibm.com, POWER, Power, POWER8, POWER9, and Power systems are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml.

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

The TensorFlow package includes code from the BoringSSL project. The following notices may apply:

    This product includes software developed by the OpenSSL Project for
    use in the OpenSSL Toolkit. (http://www.openssl.org/)

    This product includes cryptographic software written by Eric Young
    (eay@cryptsoft.com)

This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates.

THE INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the agreements under which they are provided.

Previous PowerAI releases

We recommend that you install the most current release of PowerAI, however, if you have an earlier version installed, here are the release notes for your reference:

PowerAI Release 5.1

Available: 04/20/2018

Software packages and pre-requisites

PowerAI version 1 release 5.1 provides software packages for several Deep Learning frameworks, supporting libraries, and tools:

Library or framework Version
DDL 0.9.0
TensorFlow 1.5.0
TensorBoard 1.5.0
IBM Caffe 1.0.0
BVLC Caffe 1.0.0
Spectrum MPI 10.2
Bazel 0.8.0
NCCL 1.3.5
OpenBLAS 0.2.20
HDF5 1.10.1

PowerAI is optimized to leverage the unique capabilities of IBM Power Systems accelerated servers, and is not available on any other platforms. It is supported on:

  • IBM Power System AC922 with NVIDIA Tesla V100 GPUs
  • IBM Power System S822LC with NVIDIA Tesla P100 GPUs

PowerAI requires some additional 3rd-party software components. See the table below for more information:

Library or framework Version Recommended
Red Hat Enterprise Linux (RHEL) 7.5 7.5
NVIDIA CUDA 9.2 9.2.88
NVIDIA GPU driver 396 396.26
NVIDIA cuDNN 7.1 7.1.4
Anaconda 5.1 5.1.0

Release 5.1 includes a Technology Preview of IBM PowerAI Distributed Deep Learning (DDL). Distributed Deep Learning provides support for distributed (multi-host) model training. DDL is integrated into IBM Caffe. TensorFlow support is provided by a separate package included in the PowerAI distribution.

Additional information

System Setup

Operating System

The Deep Learning packages require RHEL 7.5 little endian for IBM POWER8 and IBM POWER9. The RHEL install image and license must be acquired from Red Hat: https://www.redhat.com/en/technologies/linux-platforms/enterprise-linux

Operating System and Repository Setup

  1. Enable ‘optional’ and ‘extra’ repo channels
      IBM POWER8:
          $ sudo subscription-manager repos --enable=rhel-7-for-power-le-optional-rpms
          $ sudo subscription-manager repos --enable=rhel-7-for-power-le-extras-rpms
    
      IBM POWER9:
          $ sudo subscription-manager repos --enable=rhel-7-for-power-9-optional-rpms
          $ sudo subscription-manager repos --enable=rhel-7-for-power-9-extras-rpms
  2. Install packages needed for the installation
      $ sudo yum -y install wget nano bzip2
  3. Enable EPEL repo
       $ wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
       $ sudo rpm -ihv epel-release-latest-7.noarch.rpm
  4. Load the latest kernel
      $ sudo yum update kernel kernel-tools kernel-tools-libs kernel-bootwrapper
      $ reboot

    Or do a full update

      $ sudo yum update
      $ sudo reboot

NVIDIA Components

IBM POWER9 specific udev rules

Before installing the NVIDIA components the udev Memory Auto-Onlining Rule must be disabled for the CUDA driver to function properly. To disable it:

  1. Copy the /lib/udev/rules.d/40-redhat.rules file to the directory for user overridden rules.
      $ sudo cp /lib/udev/rules.d/40-redhat.rules /etc/udev/rules.d/
  2. Edit the /etc/udev/rules.d/40-redhat.rules file.
      $ sudo nano /etc/udev/rules.d/40-redhat.rules
  3. Comment out the following line and save the change:
      SUBSYSTEM=="memory", ACTION=="add", PROGRAM="/bin/uname -p", RESULT!="s390*", ATTR{state}=="offline", ATTR{state}="online"
  4. Optionally delete the first line of the file, since the file was copied to a directory where it won’t be overwritten.
      # do not edit this file, it will be overwritten on update
  5. Reboot the system for the changes to take effect.
      $ sudo reboot

The Deep Learning packages require CUDA, cuDNN, and GPU driver packages from NVIDIA. See the table above for the required and recommended versions of these components.

Install the components by following these steps:

  1. Download and install NVIDIA CUDA 9.2 from https://developer.nvidia.com/cuda-downloads
    • Select Operating System: Linux
    • Select Architecture: ppc64le
    • Select Distribution RHEL
    • Select Version 7
    • Select Installer Type rpm (local)
    • Follow the Linux POWER installation instructions in the CUDA Quick Start Guide, including the steps describing how to set up the CUDA development environment by updating PATH and LD_LIBRARY_PATH.

    Note: The local rpm is preferred over the network rpm as it will ensure the version installed is the version downloaded. With the network rpm, "yum install cuda" will always install the latest version of the CUDA Toolkit.

  2. Download NVIDIA cuDNN v7.1.4 for CUDA 9.2 from https://developer.nvidia.com/cudnn (Registration in NVIDIA’s Accelerated Computing Developer Program is required)
    • cuDNN v7.1.4 Library for Linux (Power8/Power9)
  3. Install the cuDNN v7.1.4 packages
       $ sudo tar -C /usr/local --no-same-owner -xzvf cudnn-9.2-linux-ppc64le-v7.1.tgz

Anaconda

A number of the Deep Learning frameworks require Anaconda. Anaconda is a platform-agnostic data science distribution with a collection of 1,000+ open source packages with free community support.

Download and Install Anaconda. Installation requires input for license agreement, install location (default is $HOME/anaconda2) and permission to modify the PATH environment variable (via .bashrc).

   $ wget https://repo.continuum.io/archive/Anaconda2-5.1.0-Linux-ppc64le.sh
   $ bash Anaconda2-5.1.0-Linux-ppc64le.sh
   $ source ~/.bashrc

Note: Anaconda2-5.1.0-Linux-ppc64le.sh is a large file, 268 MB, and has an md5sum of e894dcc547a1c7d67deb04f6bba7223a.

If multiple users are using the same system, each user should install Anaconda individually.

Installing the Deep Learning Frameworks

Software Repository Setup

The PowerAI Deep Learning packages are distributed in a tar.gz file containing an rpm and this README file. The tar.gz file must be extracted on the local machine. Installing the rpm creates an installation repository on the local machine.

Install the repository package:


       $ sudo rpm -ihv mldl-repo-*.rpm

Installing all frameworks at once

All the Deep Learning frameworks can be installed at once using the power-mldl meta-package:

    $ sudo yum install power-mldl

Note: The above step does not include installing PowerAI Distributed Deep Learning(DDL) packages. See details of how to install DDL below.

Installing frameworks individually

The Deep Learning frameworks can be installed individually if preferred. The framework packages are:

  • caffe-bvlc – Berkeley Vision and Learning Center (BVLC) upstream Caffe, v1.0.0
  • caffe-ibm – IBM Optimized version of BVLC Caffe, v1.0.0
  • tensorflow – Google TensorFlow, v1.5.0
  • tensorboard – Web Applications for inspecting TensorFlow runs and graphs, v1.5.0

Each can be installed with:

    $ sudo yum install <framework>

Install IBM PowerAI Distributed Deep Learning (DDL) packages

It is recommended to use PowerAI Distributed Deep Learning for distributing the model training load on to a cluster of Power machines. DDL includes IBM Spectrum MPI for communication among machines.

Install the PowerAI Distributed Deep Learning packages using:

    $ sudo yum install power-ddl

Note: PowerAI can be installed and used without installing DDL. In that case, you will not be able to use DDL.

To use InfiniBand for DDL communications, install the latest Mellanox OFED driver. See the *Download* tab at: http://www.mellanox.com/page/products_dyn?product_family=26

Accept the PowerAI License Agreement

Read the license agreement and accept the terms and conditions before using any of the frameworks.

    $ sudo /opt/DL/license/bin/accept-powerai-license.sh

After reading the license agreement, future installs may be automated to silently accept the license agreement.

    $ sudo IBM_POWERAI_LICENSE_ACCEPT=yes /opt/DL/license/bin/accept-powerai-license.sh

Upgrading from PowerAI Release 5

PowerAI version 1 release 5.1 requires newer versions of NVIDIA CUDA, NVIDIA cuDNN, the GPU driver, and IBM Spectrum MPI than release 5. To upgrade, the older versions should be uninstalled and the newer versions installed. Likewise, the PowerAI release 5 software packages should be uninstalled and the PowerAI version 1 release 5.1 packages installed.

Upgrading from PowerAI 1.5.0 Caffe

The Caffe packages in PowerAI 1.5.0 used the HDF5 library from Anaconda. That library is now packaged with PowerAI so the Anaconda copy is no longer needed. After upgrading to 1.5.1, it is safe to remove the library symlinks from the cache directory:

$ ls -l ~/.powerai/caffe-bvlc/
$ rm -r ~/.powerai/caffe-bvlc

$ ls -l ~/.powerai/caffe-ibm/
$ rm -r ~/.powerai/caffe-ibm

Tuning Recommendations

Recommended settings for optimal Deep Learning performance on the S822LC and AC922 for High Performance Computing are:

  • Enable Performance Governor
       $ sudo yum install kernel-tools
       $ sudo cpupower -c all frequency-set -g performance
  • Enable GPU persistence mode
       $ sudo systemctl enable nvidia-persistenced
       $ sudo systemctl start nvidia-persistenced
  • Set GPU memory and graphics clocks
    • S822LC with NVIDIA Tesla P100, set clocks to maximum
      $ sudo nvidia-smi -ac 715,1480
    • AC922 with NVIDIA Tesla V100, set clocks to NVIDIA defaults
      $ sudo nvidia-smi -rac
  • For TensorFlow, set the SMT mode
       $ sudo ppc64_cpu --smt=2

Getting Started with MLDL Frameworks

General Setup

Most of the PowerAI packages install outside the normal system search paths (to /opt/DL/...), so each framework package provides a shell script to simplify environmental setup (e.g. PATH, LD_LIBRARY_PATH, PYTHONPATH).

We recommend users update their shell rc file (e.g. .bashrc) to source the desired setup scripts. For example:

$ source /opt/DL/<framework>/bin/<framework>-activate

Each framework also provides a test script to verify basic function:

$ <framework>-test

Note about dependencies

A number of the PowerAI frameworks (for example, TensorFlow and TensorBoard) have their dependencies satisfied via Anaconda packages. These dependencies are validated by the <framework>-activate script to ensure they are installed and, if not, the script will fail.

For these frameworks, the /opt/DL/<framework>/bin/install_dependencies script must be run prior to activation to install the required packages.

For example:

$ source /opt/DL/tensorflow/bin/tensorflow-activate
Missing dependencies ['backports.weakref', 'mock', 'protobuf']
Run "/opt/DL/tensorflow/bin/install_dependencies" to resolve this problem.

$ /opt/DL/tensorflow/bin/install_dependencies
Fetching package metadata ...........
Solving package specifications: .

Package plan for installation in environment /home/rhel/anaconda2:

The following NEW packages will be INSTALLED:

    backports.weakref: 1.0rc1-py27_0
    libprotobuf:       3.4.0-hd26fab5_0
    mock:              2.0.0-py27_0
    pbr:               1.10.0-py27_0
    protobuf:          3.4.0-py27h7448ec6_0

Proceed ([y]/n)? y

libprotobuf-3. 100% |###############################| Time: 0:00:02   2.04 MB/s
backports.weak 100% |###############################| Time: 0:00:00  12.83 MB/s
protobuf-3.4.0 100% |###############################| Time: 0:00:00   2.20 MB/s
pbr-1.10.0-py2 100% |###############################| Time: 0:00:00   3.35 MB/s
mock-2.0.0-py2 100% |###############################| Time: 0:00:00   3.26 MB/s

$ source /opt/DL/tensorflow/bin/tensorflow-activate
$

Getting Started with Caffe

Caffe Alternatives

Packages are provided for upstream BVLC Caffe (/opt/DL/caffe-bvlc) and IBM’ optimized Caffe (/opt/DL/caffe-ibm). The system default Caffe (/opt/DL/caffe) can be selected using the operating system’s alternatives system:

    $ sudo update-alternatives --config caffe
    There are 2 programs which provide 'caffe'.

      Selection    Command
    -----------------------------------------------
       1           /opt/DL/caffe-bvlc
    *+ 2           /opt/DL/caffe-ibm

    Enter to keep the current selection[+], or type selection number:

Users can activate the system default caffe:

    source /opt/DL/caffe/bin/caffe-activate

Or they can activate a specific variant. For example:

    source /opt/DL/caffe-bvlc/bin/caffe-activate

Attempting to activate multiple Caffe packages in a single login session will cause unpredictable behavior.

Caffe Samples and Examples

Each Caffe package includes example scripts and sample models, etc. A script is provided to copy the sample content into a specified directory:

    $ caffe-install-samples <somedir>

More Info

Visit Caffe’s website at: http://caffe.berkeleyvision.org/ for tutorials and example programs that you can run to get started.

Here are links to a couple of the example programs:

Optimizations in IBM Caffe

The IBM Caffe package (caffe-ibm) in PowerAI is based on BVLC Caffe and includes optimizations and enhancements from IBM:

  • CPU/GPU layer-wise reduction
  • Large Model Suppot (LMS)
  • IBM PowerAI Distributed Deep Learning (DDL)

Note: DDL is to be installed separately as mentioned above.

Command Line Options

IBM Caffe supports all of BVLC Caffe’s options and adds a few new ones to control the enhancements. IBM Caffe options related to Distributed Deep Learning(options that start with the word "ddl") will work only if you have DDL installed.

  • -bvlc: Disable CPU/GPU layer-wise reduction
  • -threshold: Tune CPU/GPU layer-wise reduction. If the number of parameters for one layer is greater than or equal to threshold, their accumulation on CPU will be done in parallel. Otherwise, the accumulation will be done using one thread. It is set to 2,000,000 by default.
  • -ddl ["-option1 param -option2 param"]: Enable Distributed Deep Learning, with optional space-delimited parameter string. Supported parameters are:
    • mode <mode>
    • dumo_iter <N>
    • dev_sync <0, 1, or 2>
    • rebind_iter <N>
    • dbg_level <0, 1, or 2>
  • -ddl_update: This option instructs Caffe to use a new custom version of the ApplyUpdate function that is optimized for DDL. It is faster, but does not support gradient clipping so is off by default. It can be used in networks that do not support clipping (common).
  • -ddl_align: This option ensures that the gradient buffers have a length that is a multiple of 256 bytes and have start addresses that are multiples of 256. This ensures cache line alignment on multiple platforms as well as alignment with NCCL slices. Off by default
  • -ddl_database_restart: This option ensures every learner always looks at the same data set during an epoch. This allows a system to cache only the pages that are touched by the learners contained within it. It can help size the number of learners needed for a given data set size by establishing a known database footprint per system. Off by default.
  • -lms <size>: Enable Large Model Support with threshold of <size>. See below.
  • -lms_frac <fraction>: Tune Large Model Support memory usage between CPU and GPU. See below.

Use the command line options as follows:

    | Feature                         | -bvlc | -ddl | -lms  | -gpu          |
    |---------------------------------|-------|------|-------|---------------|
    | CPU/GPU layer-wise reduction    |   N   |   X  |   X   | multiple GPUs |
    | Distributed Deep Learning (DDL) |   X   |   Y  |   X   | N             |
    | Large model support             |   X   |   X  |   Y   | X             |

    Y: do specify
    N: don't specifiy
    X: don't care/matter

LMS gets effective regardless of other options as long as -lms is specified. For example, you can use DDL and LMS together.

CPU/GPU layer-wise reduction is enabled only if multiple GPUs are specified and layer_wise_reduce: false.

Use of multiple GPUs with DDL is specified via the MPI rank file, so the -gpu flag may not be used to specify multiple GPUs for DDL.

About CPU/GPU Layer-wise Reduction

This optimization aims to reduce the running time of a multiple-GPU training by utilizing CPUs. In particular, gradient accumulation is offloaded to CPUs and done in parallel with the training. To gain the best performance with IBM Caffe, please close unnecessary applications that consume a high percentage of CPU.

If using a single GPU, IBM Caffe and BVLC Caffe will have similar performance.

The optimizations in IBM Caffe do not change the convergence of a neural network during training. IBM Caffe and BVLC Caffe should produce the same convergence results.

CPU/GPU layer-wise reduction is enabled unless the -bvlc commandline flag is used.

About IBM PowerAI Distributed Deep Learning (DDL)

See /opt/DL/ddl/doc/README.md for more information about using IBM PowerAI Distributed Deep Learning.

About Large Model Support (LMS)

IBM Caffe with Large Model Support loads the neural model and data set in system memory and caches activity to GPU memory, allowing models and training batch size to scale significantly beyond what was previously possible. Large Model Support is available as a technology preview.

You can enable the large model support by adding -lms <size in KB>. For example -lms 1000. Then, any memory chunk larger than 1000KB will be kept in CPU memory, and fetched to GPU memory only when needed for computation. Thus, if you pass a very large value like -lms 10000000000, it will effectively disable the feature while small value means more aggressive LMS. The value is to control the performance trade-off.

As a secondary option, there is -lms_frac <0~1.0>. For example, with -lms_frac 0.4 LMS doesn’t kick in until more than at least 40% of GPU memory is expected to be taken. This is useful for disabling LMS for a small network.

Combining LMS and DDL

Large Model Support and Distributed Deep Learning can be combined. For example, to run on two hosts named host1 and host2:

    $ ddlrun -H host1,host2 -n 8 caffe train -solver alexnet_solver.prototxt -ddl "-mode n:4x2" -lms 1000

Getting Started with Tensorflow

The TensorFlow homepage (https://www.tensorflow.org/) has a variety of information, including Tutorials, How Tos, and a Getting Started guide.

Additional tutorials and examples are available from the community, for example:

High-Performance Models

A version of TensorFlow High-Performance Models which includes options to use Distributed Deep Learning is included in the tensorflow-performance-models package. For more information, see: /opt/DL/tensorflow-performance-models/scripts/tf_cnn_benchmarks/README.md

Distributed Deep Learning (DDL) Custom Operator for TensorFlow

This release of PowerAI includes a Technology Preview of the IBM PowerAI Distributed Deep Learning (DDL) custom operator for TensorFlow. The DDL custom operator uses IBM Spectrum MPI and NCCL to provide high-speed communications for distributed TensorFlow.

The DDL custom operator can be found in the ddl-tensorflow package. For more information about DDL and about the TensorFlow operator, see:

  • /opt/DL/ddl/doc/README.md
  • /opt/DL/ddl-tensorflow/doc/README.md
  • /opt/DL/ddl-tensorflow/doc/README-API.md

Additional TensorFlow Features

The PowerAI TensorFlow packages include TensorBoard. See: https://www.tensorflow.org/get_started/summaries_and_tensorboard

The TensorFlow 1.5.0 package includes support for additional features:

TensorBoard Usage Notes

Additional usage notes are available from the community. See notes at: – https://github.com/tensorflow/tensorboard

Uninstalling MLDL Frameworks

The MLDL Framework packages can be uninstalled individually the same way they were installed. In order to uninstall all MLDL packages and the repo used to install them run:

    $ sudo yum remove powerai-license
    $ sudo yum remove mldl-repo-local

PowerAI Release 5.0

Available: 12/22/2017

Featuring

  • Support for Red Hat Enterprise Linux
    PowerAI Release 5.0 now supports Red Hat Enterprise Linux 7.4.
  • Available Level 1 – Level 3 Software support
    With PowerAI Release 5.0, customers may now choose to order optional full stack software support through Support Line for PowerAI.
  • Large Model Support with IBM Caffe
    Incorporates a revised build of IBM Caffe (v1.0) with Large Model Support as a technology preview. Large Model Support, when paired with with the CPU:GPU NVLink interface on Power Systems S822LC for HPC, allows data scientists to address larger, more complex models (exceeding GPU memory) for new levels of analysis for the first time.

    IBM Caffe with Large Model Support loads the neural model and data set in system memory and caches activity to GPU memory, allowing models and training batch size to scale significantly beyond what was previously possible.
  • IBM Distributed Deep Learning Library + TensorFlow 1.4 with improved cluster performance
    Enhances cluster performance with PowerAI by augmenting TensorFlow 1.4 with an exclusive IBM cluster communication library as a technology preview. Dramatic increases in Deep Learning training performance.

    The library enables data scientists to distribute training workloads across multiple Infiniband connected servers with efficient scaling performance – dramatically reducing overall training time.

Software revision and updates

IBM PowerAI Release 5 updates core frameworks and underlying software components for stability and performance.

Release packages

Package Version
TensorFlow 1.4.0
TensorFlow Operator for IBM Distributed DL Library (DDL) 1.0rc1
IBM Distributed Deep Learning Library Docs (DDL) 1.0rc1
BLVC Caffe 1.0.0
IBM Caffe 1.0.0

Prerequisites

  • Red Hat Enterprise Linux (RHEL) 7.4 (Architecture: ppc64le)
  • NVIDIA driver version 384.81 or higher is required
  • NVIDIA CUDA version 9 (9.0.176)
  • NVIDIA cuDNN 7 (7.0.4)

Legal Notices

© Copyright IBM Corporation 2017, 2018

IBM, the IBM logo, ibm.com, POWER, Power, POWER8, POWER9, and Power systems are trademarks of International Business Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM trademarks is available on the Web at "Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.

Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.

The TensorFlow package includes code from the BoringSSL project. The following notices may apply:

    This product includes software developed by the OpenSSL Project for
    use in the OpenSSL Toolkit. (http://www.openssl.org/)

    This product includes cryptographic software written by Eric Young
    (eay@cryptsoft.com)

This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available in every country in which IBM operates.

THE INFORMATION IN THIS DOCUMENT IS PROVIDED "AS IS" WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES