Taxonomy Icon

Artificial Intelligence

IBM Power Systems are always built for the most demanding and data-intensive computing workloads. From the processor architecture and server hardware to software and services support, IBM Power Systems have been re-imaged for infrastructure in the AI era. TensorFlow is an open source software library for numerical computation using data flow graphs. TensorFlow is among the fast growing and popular deep learning frameworks. While the preferred method of installing TensorFlow is by using the available TensorFlow binary packages, there are times when you may want to use the latest TensorFlow features and thus need to install it from sources. This tutorial will demonstrate how to compile and install TensorFlow from the source code on a power architecture server with GPU support.

Learning objectives

This tutorial will demonstrate installation of TensorFlow master code on a Power8 server with Ubuntu 16.04, Python 3.5 and NVIDIA CUDA support.

Prerequisites

  • Operating System: Ubuntu 16.04
  • Server with NVIDIA GPU

In our test environment, our Power8 server already has NVIDIA CUDA toolkit and driver installed with the following configuration:

  • 32-thread POWER8.
  • 128 GB RAM.
  • 1 P100 Tesla GPU with NVLink (np8g1).
  • NVIDIA CUDA 8.0.6 and driver version 396.15.

Estimated time

  • Bazel package compile time is approximately 10 minutes.
  • TensorFlow bazel build time is approximately 50 minutes.
  • Total compile and install time is approximately 60 minutes.

Steps

1. Verify NVIDIA CUDA toolkit and driver

The NVIDIA CUDA Toolkit provides a development environment for creating high performance GPU-accelerated applications. It is required by Tensorflow in order to utilize the GPUs.

To validate the currently installed driver and toolkit execute the following instructions:

Verify and note the CUDA version

$ cat /usr/local/cuda/version.txt
  CUDA Version 8.0.61

Verify and note the driver version installed. In our demo server, the driver vesion is 395.15 as shown in the sample output below.

$ cat /proc/driver/nvidia/version
  NVRM version: NVIDIA UNIX ppc64le Kernel Module  396.15  Thu Mar 22 18:28:48 PDT 2018
  GCC version:  gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.2)

Keep in mind that some environments may have the driver and toolkit installed but not activated. In order to activate already installed driver and toolkit, the following commands can be used:

# check your nvidia-??? lib version, it may not be the same as the one specified here
export PATH="/usr/local/cuda/bin:/usr/lib/nvidia-396/bin/:$PATH"
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64

Finally, verify that NVIDIA toolkit version is matching the CUDA version.

$ nvcc -V
  nvcc: NVIDIA (R) Cuda compiler driver
  Copyright (c) 2005-2016 NVIDIA Corporation
  Built on Tue_Jan_10_13:28:28_CST_2017
  Cuda compilation tools, release 8.0, V8.0.61

2. Install CUDNN library

Compiling TensorFlow requires NVIDIA cuDNN which is a GPU-accelerated library of primitives for deep neural networks.

Verify whether cuDNN is already installed by checking for the libcudnn* files. If cuDNN is installed, note the installed version(s).

sudo apt list --installed | grep libcudnn*
or
ls /usr/local/cuda/lib64/libcudnn*
or
ls /usr/lib/powerpc64le-linux-gnu/libcudnn*

If cuDNN is not installed, follow the instruction below to install it. On Ubuntu systems, cuDNN packages are provided as Ubuntu repository hosted by NVIDIA. First, we need to add the cuDNN library Ubuntu repository to the apt sources:

echo "deb http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/ppc64el /" | sudo tee /etc/apt/sources.list.d/cudnn.list
curl -L http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/ppc64el/7fa2af80.pub | sudo apt-key add -

sudo apt-get update

Next step is to choose the correct version of the libcudnn library depending on the installed CUDA version.

Option 1

For CUDA 8.0, libcudnn5 or libcudnn6 can be installed by executing the following command:

sudo apt-get install -y libcudnn5 libcudnn5-dev
or
sudo apt-get install -y libcudnn6 libcudnn6-dev #This option was used for the demo

Option 2

The following section describes installing libcudnn7 on CUDA 8.0. Since default CUDA version for libcudnn7 is CUDA 9.0 for Ubuntu, that is why correct version should be specified.

In order to list the cuDNN library available versions, execute following command:

apt-cache policy libcudnn7

A list with available versions will be listed, e.g.:

$ apt-cache policy libcudnn7
libcudnn7:
  Installed: (none)
  Candidate: 7.0.3.11-1+cuda9.0
  Version table:
     7.0.3.11-1+cuda9.0 500
        500 http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/ppc64el  Packages
     7.0.3.11-1+cuda8.0 500
        500 http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/ppc64el  Packages
     7.0.2.38-1+cuda8.0 500
        500 http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/ppc64el  Packages
     7.0.1.13-1+cuda8.0 500
        500 http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/ppc64el  Packages

Choose the correct version from the list above. For example we will choose the latest version for CUDA 8.0 (7.0.3.11-1+cuda8.0). Install libcudnn7 with specific version by executing the following:

sudo apt-get install -y libcudnn7=7.0.3.11-1+cuda8.0 libcudnn7-dev=7.0.3.11-1+cuda8.0

3. Download and install Miniconda (optional)

We will now install Anaconda, which is a Python distribution, by downloading and running the installer.

cd ~
wget -c https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-ppc64le.sh
chmod 744 Miniconda3-latest-Linux-ppc64le.sh
./Miniconda3-latest-Linux-ppc64le.sh

Follow the instructions displayed to accept the license and set the installation path. We do not recommend adding conda install location to PATH in your ~/.bashrc file.

4. Create a virtual environment (optional)

Though this is an optional step, using a virtual environment will help keep your python projects isolated on a single server. The following commands will create an environment named tensorflow with Python 3.5.

Note: Set python=2.7 if you want to install TensorFlow with Python 2.7.

~/miniconda3/bin/conda create -n tensorflow python=3.5
source ~/miniconda3/bin/activate tensorflow

5. Install openBLAS

# Install openBLAS
~/miniconda3/bin/conda install openblas

6. Install Bazel

Update apt sources

sudo apt-get update

Install prerequisites

sudo apt-get install -y build-essential openjdk-8-jdk python zip

Download the dist source zip

cd ~
wget https://github.com/bazelbuild/bazel/releases/download/0.14.0/bazel-0.14.0-dist.zip
mkdir bazel
cd bazel
unzip ../bazel-0.14.0-dist.zip
./compile.sh

Note: There is a known issue regarding compiling Bazel on server with 4.4.0-45-generic kernel. See the Known issues section below for more information.

Copy the resulted binary under ~/bin

mkdir -p ~/bin
cp output/bazel ~/bin

7. Install TensorFlow

Install TensorFlow prerequisites

# Note: some packages may have been installed by conda when creating the virtual environment
sudo apt-get install -y python-dev python-pip python-wheel python3-numpy python3-dev python3-pip python3-wheel

# Use libcupti library directory that ships with Cuda to the LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64

pip install numpy

Clone the TensorFlow repository

cd ~
git clone https://github.com/tensorflow/tensorflow
cd tensorflow

Note: The above instructions are for installing TensorFlow master code. We tested the instruction with commit: 397f04acb1faeff451691d7fdc0f754eeb547cc1 (June 5, 2018).

Configure TensorFlow install

./configure

Following is an example showing inputs to the configuration questions.

$ ./configure
You have bazel 0.14.0- (@non-git) installed.
Please specify the location of python. [Default is /home/cdiep/miniconda3/envs/tensorflow/bin/python]:


Found possible Python library paths:
  /home/cdiep/miniconda3/envs/tensorflow/lib/python3.5/site-packages
Please input the desired Python library path to use.  Default is [/home/cdiep/miniconda3/envs/tensorflow/lib/python3.5/site-packages]

Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: y
jemalloc as malloc support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
No Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
No Amazon S3 File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: n
No Apache Kafka Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]: n
No GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]: n
No VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 9.0]: 8.0


Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:


Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 6


Please specify the location where cuDNN 6 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:


Do you wish to build TensorFlow with TensorRT support? [y/N]: n
No TensorRT support will be enabled for TensorFlow.

Please specify the NCCL version you want to use. [Leave empty to default to NCCL 1.3]:


Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2]3.5,3.7,5.2,6.0


Do you want to use clang as CUDA compiler? [y/N]: n
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:


Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -mcpu=native]: -mcpu=power8 -mtune=power8


Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
        --config=mkl            # Build with MKL support.
        --config=monolithic     # Config for mostly static monolithic build.
Configuration finished

Build pip package:

bazel build //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package ../tensorflow_pkg

Note: There is a known issue regarding compiling TensorFlow 1.6 or later with CUDA 8.0. See the Known issues section below for more information.

Install TensorFlow

cd ~/tensorflow_pkg/
pip install <wheel file name>

Note: You may want to save the wheel file so that you can re-use it to install TensorFlow later.

8. Validate the installation

Run a short TensorFlow program

Invoke python from your shell as follows:

python

Enter the following short program inside the python interactive shell:

# Python
import tensorflow as tf
hello = tf.constant('Hello, TensorFlow!')
sess = tf.Session()
print(sess.run(hello))

If the system outputs the following, then you are ready to begin writing TensorFlow programs:

Hello, TensorFlow!

9. Known issues

1. Bazel compilation failure

Bazel compilation may fail due to the way Bazel parses /proc/meminfo and issue of server with 4.4.0-45-generic kernel whose /proc/meminfo lists duplicate entries for Active: and Inactive: entries.

Error messages:
~/bazel$ ./compile.sh
??  Building Bazel from scratch......
??  Building Bazel with Bazel.
.java.lang.IllegalArgumentException: Multiple entries with same key: Active=399872 and Active=399872
  at com.google.common.collect.ImmutableMap.conflictException(ImmutableMap.java:215)
  at com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:209)
  at com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:147)
  at com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:110)
  .........

Solutions: Upgrade the kernel

2. Install TensorFlow 1.6 or later with CUDA 8.0

Execute the following commands if installing TensorFlow 1.6 or later on an environment with CUDA 8.0. See here for more information.

cd /usr/local/cuda-8.0/nvvm/libdevice
sudo ln -s libdevice.compute_50.10.bc libdevice.10.bc

Summary

In this tutorial, we described the steps to compile and install TensorFlow from the source code on a Power8 server with NVIDIA GPU support.