â€śChainerâ€ť (http://chainer.org) is one of the most important Deep Learning platforms. Chainer is a flexible framework provided as a python library and supports CUDA and multi-GPU capabilities.
IBM Power Systems, such as IBM POWER SYSTEM AC922 (https://www.ibm.com/marketplace/power-systems-ac922), are the best choice to run Chainer and other Deep Learning platforms with multi-GPU capability because they are designed for cognitive computing.
Because of this capability, IBM Research â€“ Tokyo fully optimized the latest Chainer V4.0 to utilize IBM Power Systems and prepared the
pip installation method as well as source code access from its github repository. We call the optimized Chainer as “IBM-optimized Chainer,” which can be installed by the
pip command without downloading and building the source code manually.
This blog provides instructions for installing the IBM-optimized Chainer on OpenPOWER Linux distributions, such as Ubuntu 16.04, Red Hat Enterprise Linux 7.1, and subsequent releases.
About IBM-optimized Chainer
IBM-optimized Chainer is a version of Chainer V4.0 optimized for OpenPOWER Linux distributions.
The IBM-optimized Chainer provides the two features mentioned below. You can try these features by
bench-OOC.sh) in the github repository (https://github.com/negiyas/chainer/tree/v4.0.0a3-ibm/examples/imagenet).
- Auto Workspace Tuning
Optimize GPU memory usage to choose a best algorithm for Convolution.
- Out-of-Core Memory support (Large Model Support)
Enlarge effective GPU memory by swapping-out/in data between GPU and CPU memory.
Auto Workspace tuning
This feature optimizes performance of training with Convolution networks by optimizing GPU memory usage, maximizing the working buffer for Convolution network training, and choosing the best algorithm. This feature works effectively for some networks, although it does not work for all Convolution networks.
The above graph compares VGG16 learning performances of the â€śoriginal Chainer,â€ť the â€śoriginal Chainer with auto algorithm selection,â€ť and the â€śIBM-optimized Chainer with Auto Workspace tuningâ€ť on IBM POWER SYSTEM AC922 using one GPU. The Y axis shows â€śiterations/sec,â€ť and performance of the IBM-optimized Chainer is 1.73X better than â€śoriginal Chainer with auto algorithm selection optionâ€ť, and 1.86X better than the â€śoriginal Chainer.â€ť
Out-of-Core Memory Support (Large Model Support)
This feature enlarges effective GPU memory by swapping-out/in data between GPU and CPU memory so that users can train larger neural network models with this feature. With this feature, users can run ResNet50 with a 3.5Ă— larger minibatch, and they can run a 2.0Ă— larger minibatch with an enlarged GoogLeNet (2240 Ă— 2240).
Source codes and sample programs of IBM-optimized Chainer
You can see the source codes and sample programs using these features of the IBM-optimized Chainer at the following github sites.
- Cupy: https://github.com/negiyas/cupy/tree/v4.0.0a3-ibm
- Chainer: https://github.com/negiyas/chainer/tree/v4.0.0a3-ibm
Install IBM-Optimized Chainer
The following part provides instructions to install and test IBM-optimized Chainer.
- Install CUDA, cuDNN and NCCL.
Download and install CUDA, cuDNN, and NCCL2 for the â€śppc64leâ€ť architecture from the following NVIDIA sites.
- Install Python
Chainer requires the following versions of Python (See https://github.com/pfnet/chainer/blob/master/README.md).
- Python 2.7.6+, 3.4.3+, 3.5.1+ or 3.6.0+
Please check the python version on your system by
python â€“Vcommand. If the version is not supported, please install or update python as follows:
$ sudo apt-get install python
$ sudo apt-get upgrade python
$ sudo yum install python
$ sudo yum update python
- Install pip
Because Chainer is provided as a python library, you need a
pipcommand, a package management system for python. If a
pipis not installed on your system, please follow these steps to install
$ sudo apt-get install python-pip
$ sudo pip install --upgrade pip
On RHEL, you need to subscribe to the EPEL (Extra Package for Enterprise Linux) repository for yum at first, and then install it as follows:
$ sudo yum install epel-release
$ sudo yum install python-pip
$ sudo pip install --upgrade pip
- Install numpy
Because some libraries conflict with installing numpy in some cases, a python module for numerical calculation, install numpy at first as follows:
$ sudo pip install numpy
- Pre-required software for Chainer
Install pre-required software as follows:
$ sudo apt-get install git gcc make openssl libssl-dev libbz2-dev libreadline-dev libsqlite3-dev
$ sudo yum install git gcc make openssl-devel bzip2-devel readline-devel sqlite-devel
- Install HDF5
Install HDF5 library as follows:
$ sudo apt-get install libhdf5-serial-dev libhdf5-mpich-dev libhdf5-openmpi-dev
On RHEL, please download and build the HDF5 source code, as follows:
$ wget http://www.hdfgroup.org/ftp/HDF5/current/src/hdf5-1.8.17.tar.gz
$ tar xvfz hdf5-1.8.17.tar.gz
$ cd hdf5-1.8.17
$ ./configure [--prefix=
] --enable-fortran --enable-cxx \
--build=powerpc64le-linux-gnu# specify â€śâ€“prefixâ€ť option if necessary.
$ make install
- Install IBM-Optimized Cupy and Chainer
Chainer installer investigates your environments, such as CUDA path, during the installation. Please finish the previous steps before installing Chainer.
You can install Chainer by the following command. This step takes time, because the pip command downloads and compiles the source code in this step.
$ sudo pip install cupy-ibmopt --no-cache-dir
$ sudo pip install chainer-ibmopt --no-cache-dir
If it fails, please check https://github.com/pfnet/chainer#installation.
Test IBM-Optimized Chainer
After your installation is complete, follow these steps to run example code in order to check if Chainer is correctly installed with CUDA and cuDNN.
- Download the imagenet example in Chainer.
$ mkdir -p ~/src
$ cd ~/src
$ git clone https://github.com/negiyas/chainer.git
$ cd chainer
$ git checkout -b v4.0.0a3-ibm refs/tags/v4.0.0a3-ibm
Switched to a new branch 'v4.0.0a3-ibm'
- Run an example code with the Auto Workspace Tuning feature.
$ cd examples/imagenet
$ SYNTH=1 ARCH=vgg16 INSIZE=224 BATCHSIZE=32 AUTOWS=1 ./go.sh
python train_imagenet.py --arch vgg16 --batchsize 32 --gpu 0 --autows --iteration 1000
--loaderjob 4 --val_batchsize 10 -o LOG-20180725/20180725-215838-vgg16-0224-0032.result --synthetic data val
total [#############.....................................] 27.00%
this epoch [#####################.............................] 43.75%
270 iter, 8 epoch / 1000 iterations
4.5545 iters/sec. Estimated time to finish: 0:02:40.279501.
If you do not see warning or error messages, Chainer is correctly installed with CUDA and cuDNN. Otherwise, please check https://github.com/pfnet/chainer#installation.
Run benchmarks for the Automatic Workspace Tuning and Out-of-Core Memory Support features
You can run benchmarks for the Automatic Workspace Tuning and Out-of-Core Memory Support features mentioned above as follows:
for the Automatic Workspace tuning features
for the Out-of-Core Memory Support features
What can you do with Chainer on OpenPOWER
A variety of GPU numerical accelerator configurations can be used to accelerate Chainer on OpenPOWER systems, such as IBM POWER SYSTEM AC922. You can learn more about and order these systems by contacting your IBM Business Partner.
IBM invites GPU software developers to join the IBM-NVIDIA Acceleration Lab to be among the first to try these systems and see the benefits of the Tesla V100 GPU accelerator and the high-speed NVLink2 connection to the IBM POWER9 CPU.
I look forward to hearing about the performance you get from these systems. Share how you want to use Chainer on OpenPOWER and how Deep Learning on OpenPOWER will enable you to build the next generation of cognitive applications by posting in the comments section below.