“Chainer” (http://chainer.org) is one of the most important Deep Learning platforms. Chainer is a flexible framework provided as a python library and supports CUDA and multi-GPU capabilities.

IBM Power Systems, such as IBM POWER SYSTEM AC922 (https://www.ibm.com/marketplace/power-systems-ac922), are the best choice to run Chainer and other Deep Learning platforms with multi-GPU capability because they are designed for cognitive computing.

Because of this capability, IBM Research – Tokyo fully optimized the latest Chainer V4.0 to utilize IBM Power Systems and prepared the pip installation method as well as source code access from its github repository. We call the optimized Chainer as “IBM-optimized Chainer,” which can be installed by the pip command without downloading and building the source code manually.

This blog provides instructions for installing the IBM-optimized Chainer on OpenPOWER Linux distributions, such as Ubuntu 16.04, Red Hat Enterprise Linux 7.1, and subsequent releases.

About IBM-optimized Chainer

IBM-optimized Chainer is a version of Chainer V4.0 optimized for OpenPOWER Linux distributions.
The IBM-optimized Chainer provides the two features mentioned below. You can try these features by sh scripts(bench-AUTOWS.sh and bench-OOC.sh) in the github repository (https://github.com/negiyas/chainer/tree/v4.0.0a3-ibm/examples/imagenet).

  1. Auto Workspace Tuning
    Optimize GPU memory usage to choose a best algorithm for Convolution.
  2. Out-of-Core Memory support (Large Model Support)
    Enlarge effective GPU memory by swapping-out/in data between GPU and CPU memory.

Auto Workspace tuning

This feature optimizes performance of training with Convolution networks by optimizing GPU memory usage, maximizing the working buffer for Convolution network training, and choosing the best algorithm. This feature works effectively for some networks, although it does not work for all Convolution networks.

Chainer performance on IBM POWER SYSTEM AC922

The above graph compares VGG16 learning performances of the “original Chainer,” the “original Chainer with auto algorithm selection,” and the “IBM-optimized Chainer with Auto Workspace tuning” on IBM POWER SYSTEM AC922 using one GPU. The Y axis shows “iterations/sec,” and performance of the IBM-optimized Chainer is 1.73X better than “original Chainer with auto algorithm selection option”, and 1.86X better than the “original Chainer.”

Out-of-Core Memory Support (Large Model Support)

This feature enlarges effective GPU memory by swapping-out/in data between GPU and CPU memory so that users can train larger neural network models with this feature. With this feature, users can run ResNet50 with a 3.5× larger minibatch, and they can run a 2.0× larger minibatch with an enlarged GoogLeNet (2240 × 2240).

Source codes and sample programs of IBM-optimized Chainer

You can see the source codes and sample programs using these features of the IBM-optimized Chainer at the following github sites.

Install IBM-Optimized Chainer

The following part provides instructions to install and test IBM-optimized Chainer.

  1. Install CUDA, cuDNN and NCCL.
    Download and install CUDA, cuDNN, and NCCL2 for the “ppc64le” architecture from the following NVIDIA sites.

  2. Install Python
    Chainer requires the following versions of Python (See https://github.com/pfnet/chainer/blob/master/README.md).

    • Python 2.7.6+, 3.4.3+, 3.5.1+ or 3.6.0+

    Please check the python version on your system by python –V command. If the version is not supported, please install or update python as follows:

    On Ubuntu
    $ sudo apt-get install python
    $ sudo apt-get upgrade python
    On RHEL
    $ sudo yum install python
    $ sudo yum update python

  3. Install pip

    Because Chainer is provided as a python library, you need a pip command, a package management system for python. If a pip is not installed on your system, please follow these steps to install pip:

    On Ubuntu
    $ sudo apt-get install python-pip
    $ sudo pip install --upgrade pip

    On RHEL, you need to subscribe to the EPEL (Extra Package for Enterprise Linux) repository for yum at first, and then install it as follows:
    $ sudo yum install epel-release
    $ sudo yum install python-pip
    $ sudo pip install --upgrade pip

  4. Install numpy
    Because some libraries conflict with installing numpy in some cases, a python module for numerical calculation, install numpy at first as follows:
    $ sudo pip install numpy

  5. Pre-required software for Chainer
    Install pre-required software as follows:
    On Ubuntu,
    $ sudo apt-get install git gcc make openssl libssl-dev libbz2-dev libreadline-dev libsqlite3-dev
    On RHEL,
    $ sudo yum install git gcc make openssl-devel bzip2-devel readline-devel sqlite-devel
  6. Install HDF5

    Install HDF5 library as follows:
    On Ubuntu,
    $ sudo apt-get install libhdf5-serial-dev libhdf5-mpich-dev libhdf5-openmpi-dev
    On RHEL, please download and build the HDF5 source code, as follows:
    $ wget http://www.hdfgroup.org/ftp/HDF5/current/src/hdf5-1.8.17.tar.gz
    $ tar xvfz hdf5-1.8.17.tar.gz
    $ cd hdf5-1.8.17
    $ ./configure [--prefix=] --enable-fortran --enable-cxx \
    --build=powerpc64le-linux-gnu # specify “–prefix” option if necessary.
    $ make
    $ make install

  7. Install IBM-Optimized Cupy and Chainer
    Chainer installer investigates your environments, such as CUDA path, during the installation. Please finish the previous steps before installing Chainer.
    You can install Chainer by the following command. This step takes time, because the pip command downloads and compiles the source code in this step.
    $ sudo pip install cupy-ibmopt --no-cache-dir
    $ sudo pip install chainer-ibmopt --no-cache-dir

    If it fails, please check https://github.com/pfnet/chainer#installation.

Test IBM-Optimized Chainer

After your installation is complete, follow these steps to run example code in order to check if Chainer is correctly installed with CUDA and cuDNN.

  • Download the imagenet example in Chainer.

    $ mkdir -p ~/src
    $ cd ~/src
    $ git clone https://github.com/negiyas/chainer.git
    $ cd chainer
    $ git checkout -b v4.0.0a3-ibm refs/tags/v4.0.0a3-ibm
    Switched to a new branch 'v4.0.0a3-ibm'

  • Run an example code with the Auto Workspace Tuning feature.

    $ cd examples/imagenet
    $ SYNTH=1 ARCH=vgg16 INSIZE=224 BATCHSIZE=32 AUTOWS=1 ./go.sh
    python train_imagenet.py --arch vgg16 --batchsize 32 --gpu 0 --autows --iteration 1000
    --loaderjob 4 --val_batchsize 10 -o LOG-20180725/20180725-215838-vgg16-0224-0032.result --synthetic data val
    total [#############.....................................] 27.00%
    this epoch [#####################.............................] 43.75%
    270 iter, 8 epoch / 1000 iterations
    4.5545 iters/sec. Estimated time to finish: 0:02:40.279501.

If you do not see warning or error messages, Chainer is correctly installed with CUDA and cuDNN. Otherwise, please check https://github.com/pfnet/chainer#installation.

Run benchmarks for the Automatic Workspace Tuning and Out-of-Core Memory Support features

You can run benchmarks for the Automatic Workspace Tuning and Out-of-Core Memory Support features mentioned above as follows:

for the Automatic Workspace tuning features
$ ./bench-AUTOWS.sh
for the Out-of-Core Memory Support features
$ ./bench-OOC.sh

What can you do with Chainer on OpenPOWER

A variety of GPU numerical accelerator configurations can be used to accelerate Chainer on OpenPOWER systems, such as IBM POWER SYSTEM AC922. You can learn more about and order these systems by contacting your IBM Business Partner.

IBM invites GPU software developers to join the IBM-NVIDIA Acceleration Lab to be among the first to try these systems and see the benefits of the Tesla V100 GPU accelerator and the high-speed NVLink2 connection to the IBM POWER9 CPU.

I look forward to hearing about the performance you get from these systems. Share how you want to use Chainer on OpenPOWER and how Deep Learning on OpenPOWER will enable you to build the next generation of cognitive applications by posting in the comments section below.

Join The Discussion

Your email address will not be published. Required fields are marked *