Overview

Skill Level: Intermediate

Docker, Linux

This article shows how to build and run Dockerized deep learning analytics using PowerAI libraries on a IBM “Minsky” system with GPUs.

Prerequisites

IBM PowerAI is a collection of open source deep learning frameworks (e.g. Tensorflow and Caffe) built optimized to run on OpenPOWER systems with NVLink and NVidia GPUs. Docker is an open source Linux container software platform. Docker runs great on the IBM POWER system architecture. This tutorial will walk you through some steps to build Docker images with PowerAI software and run in Docker containers with GPUs using the nvidia-docker plugin. The Docker images are very extendible and can be modified to include other software and capture dependencies into a self-contained deployable unit. In future tutorials, we will also show how to create scale-out clusters with these docker images using kubernetes. To try out these steps, you will require a Minsky machine with Ubuntu 16.04 with kernel version 4.4.0 or RHEL 7.2 (or 7.3) with kernel version 3.10.0. The docker images can be built on any POWER8 VM or in NIMBIX. virtual machine or a container with docker engine installed. For open source development activities a free VM can be requested from the Oregon State University Open Source Lab.

Step-by-step

  1. Install GPU drivers

    1. To verify that your GPU is CUDA-capable, go to your distribution’s equivalent of System Properties, or, from the command line, enter:
      root@pubuntu01:~/nvidia-docker/dist# lspci | grep -i nvidia
      0002:01:00.0 3D controller: NVIDIA Corporation Device 15f9 (rev a1)
      000a:01:00.0 3D controller: NVIDIA Corporation Device 15f9 (rev a1)
      If you do not see any settings, update the PCI hardware database that Linux maintains by entering
      update-pciids
      (generally found in /sbin ) at the command line and rerun the previous lspci comman
    2. Download and install NVIDIA CUDA 8.0 from https://developer.nvidia.com/cuda-downloads
      1. Select Operating System: Linux
      2. Select Architecture: ppc64le
      3. Select Distribution Ubuntu or RHEL
      4. Select Version 16.04 or 7
      5. Select the Installer Type that best fits your needs
    3. Install like this on Ubuntu (version 16.04.2, kernel: 4.4.0-81-generic):

      apt-get update
      wget https://developer.nvidia.com/compute/cuda/8.0/prod/local_installers/cuda-repo-ubuntu1604-8-0-local_8.0.44-1_ppc64el-deb
      wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/patches/2/cuda-repo-ubuntu1604-8-0-local-cublas-performance-update_8.0.61-1_ppc64el-deb
      wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda-repo-ubuntu1604-8-0-local-ga2v2_8.0.61-1_ppc64el-deb
      dpkg -i cuda-repo-ubuntu1604-8-0-local-ga2v2_8.0.61-1_ppc64el-deb
      dpkg -i cuda-repo-ubuntu1604-8-0-local_8.0.44-1_ppc64el-deb
      apt-get update
      apt-get install cuda
      dpkg -i cuda-repo-ubuntu1604-8-0-local-cublas-performance-update_8.0.61-1_ppc64el-deb

      apt-get update

    4. Install like this on RHEL (version 7.3 kernel: 3.10.0-514.el7.ppc64le):

      yum install -y wget git
      cd /tmp
      export CUDA_REPO_URL=https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda-repo-rhel7-8-0-local-ga2v2-8.0.61-1.ppc64le-rpm
      wget ${CUDA_REPO_URL}
      rpm -i cuda-repo-rhel7-8.0.61-1.ppc64le.rpm
      yum clean all
      unset CUDA_REPO_URL

      export EPEL_REPO_URL=https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
      wget ${EPEL_REPO_URL}
      yum install epel-release-latest-7.noarch.rpm
      unset EPEL_REPO_URL

      yum install cuda

      More details: http://developer.download.nvidia.com/compute/cuda/6_0/rel/docs/CUDA_Getting_Started_Linux.pdf

  2. Install Docker

    1. On Ubuntu (version 17.06.1):
      1. Install the Docker repository.
        # echo deb http://ftp.unicamp.br/pub/ppc64el/ubuntu/16_04/docker-17.06.1-ce-ppc64el/ xenial main > /etc/apt/sources.list.d/xenial-docker.list
      2. Update the archive index.
        # apt-get update
      3. Install the Docker package.
        # apt-get install docker-ce
    2. On RHEL (version 17.06.1):
      1. Install the Docker repository.
        #cat < /etc/yum.repos.d/docker.repo >>EOF [docker] name=Docker baseurl=http://ftp.unicamp.br/pub/ppc64el/rhel/7/docker-ppc64el/ enabled=1 gpgcheck=0 EOF
      2. Install the Docker package.
        # yum install docker
      3. Start the Docker engine
        # service docker start
  3. Install nvidia-docker

    Nvidia-docker is a Docker plugin which provides ease of deploying containers with GPU devices attached. nvidia-docker can be easily installed on a IBM S822LC-hpc machine following steps for the ppc64le architecture in this article. To verify the installation:

    1. docker pull nvidia/cuda-ppc64le
    2. docker images
    3. nvidia-docker run –rm nvidia/cuda-ppc64le nvidia-smi
    4. You should see:
      root@pubuntu02:~# systemctl status nvidia-docker.service
      nvidia-docker.service – NVIDIA Docker plugin
      Loaded: loaded (/lib/systemd/system/nvidia-docker.service; enabled; vendor preset: enabled)
      Active: active (running) since Thu 2017-06-22 14:44:02 EDT; 1min 54s ago
      Docs: https://github.com/NVIDIA/nvidia-docker/wiki
      Process: 2659 ExecStartPost=/bin/bash -c /bin/echo unix://$SOCK_DIR/nvidia-docker.sock > $SPEC_FILE (code=exited, status=0/SUCCESS)
      Process: 2642 ExecStartPost=/bin/bash -c /bin/mkdir -p $( dirname $SPEC_FILE ) (code=exited, status=0/SUCCESS)
      Main PID: 2641 (nvidia-docker-p)
      Tasks: 7
      Memory: 31.1M
      CPU: 319ms

      CGroup: /system.slice/nvidia-docker.service
      └─2641 /usr/bin/nvidia-docker-plugin -s /var/lib/nvidia-docker -d /usr/local/nvidia-driver
      Jun 22 14:44:02 pubuntu02 systemd[1]: Starting NVIDIA Docker plugin…
      Jun 22 14:44:02 pubuntu02 systemd[1]: Started NVIDIA Docker plugin.

      Jun 22 14:44:02 pubuntu02 nvidia-docker-plugin[2641]: /usr/bin/nvidia-docker-plugin | 2017/06/22 14:44:02 Loading NVIDIA unified memory
      Jun 22 14:44:03 pubuntu02 nvidia-docker-plugin[2641]: /usr/bin/nvidia-docker-plugin | 2017/06/22 14:44:03 Loading NVIDIA management library
      Jun 22 14:44:07 pubuntu02 nvidia-docker-plugin[2641]: /usr/bin/nvidia-docker-plugin | 2017/06/22 14:44:07 Discovering GPU devices
      Jun 22 14:44:07 pubuntu02 nvidia-docker-plugin[2641]: /usr/bin/nvidia-docker-plugin | 2017/06/22 14:44:07 Provisioning volumes at /usr/local/nvidia-dr
      Jun 22 14:44:07 pubuntu02 nvidia-docker-plugin[2641]: /usr/bin/nvidia-docker-plugin | 2017/06/22 14:44:07 Serving plugin API at /var/lib/nvidia-docker
      Jun 22 14:44:07 pubuntu02 nvidia-docker-plugin[2641]: /usr/bin/nvidia-docker-plugin | 2017/06/22 14:44:07 Serving remote API at localhost:3476

      root@pubuntu02:~# nvidia-docker run –rm nvidia/cuda-ppc64le nvidia-smi
      Thu Jun 22 18:49:48 2017
      +—————————————————————————–+
      | NVIDIA-SMI 361.119 Driver Version: 361.119 |
      |——————————-+———————-+———————-+
      | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
      | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
      |===============================+======================+======================|
      | 0 Tesla P100-SXM2… On | 0002:01:00.0 Off | 0 |
      | N/A 51C P0 236W / 300W | 6849MiB / 16280MiB | 91% Default |
      +——————————-+———————-+———————-+
      | 1 Tesla P100-SXM2… On | 000A:01:00.0 Off | 0 |
      | N/A 24C P0 34W / 300W | 601MiB / 16280MiB | 0% Default |
      +——————————-+———————-+———————-+
      +—————————————————————————–+
      | Processes: GPU Memory |
      | GPU PID Type Process name Usage |
      |=============================================================================|
      +—————————————————————————–+

  4. Build a PowerAI Docker image

    Nvidia also provides a few base images (Centos or Ubuntu) with libraries such as libcudnn from DockerHub (see https://hub.docker.com/r/nvidia/cuda-ppc64le/tags/)

    1. Create a Dockerfile which extends an nvidia-docker base image and installs PowerAI libraries like this:
      https://github.com/knm3000/nvidia-powerai/blob/master/Dockerfile
    2. Build it like this:
      docker build -t nvidia-powerai .
    3. Run it like this:
      nvidia-docker run –rm -it nvidia-powerai
  5. Run containerized DL with GPUs

    1. Run caffe-test in the container:
    2. nvidia-docker run –rm -it nvidia-powerai bash
    3. source /opt/DL/caffe-ibm/bin/caffe-activate
    4. /opt/DL/caffe-ibm/bin/caffe-test
    5. You should see an output like this.
    1. Run tensorflow-test in the container like this:
    2. nvidia-docker run –rm -it nvidia-powerai bash
    3. source /opt/DL/tensorflow/bin/tensorflow-activate
    4. /opt/DL/tensorflow/bin/tensorflow-test
    5. You should see an output like this.

Expected outcome

Now you have your own PowerAI docker image (nvidia-powerai) that you can run on any IBM S822LC-hpc machine (a.k.a Minsky) with GPUs. This tutorial was based on work performed by a whole bunch of IBM’ers including Konstantin Maximov, Ilsiyar Gaynutdinov, Yulia Gaponenko, Igor Khapov, Alanny Lopez, Gabriel Flo-Manaila, Yu Bo Li, Seetharami Seelam.

Join The Discussion

Your email address will not be published. Required fields are marked *