The NVIDIA CUDA Toolkit provides a development environment for creating high-performance GPU-accelerated applications. It includes GPU-accelerated libraries and tools as well as a C/C++ compiler and a runtime library to deploy your application. You must compile TensorFlow to use the GPUs.
This tutorial explains how to verify whether the NVIDIA toolkit has been installed previously in an environment. It also provides instructions on how to install NVIDIA CUDA on a POWER architecture server.
This tutorial uses a POWER8 server with the following configuration:
- Operating system: Ubuntu 16.04
- 32-thread POWER8
- 128 GB RAM
- 1 P100 Tesla GPU with NVLink (np8g1)
It should take you approximately 25 minutes to complete this tutorial.
1. Verify the NVIDIA CUDA toolkit and driver
To validate the currently installed driver and toolkit, run the following command. Verify and note the CUDA version.
$ cat /usr/local/cuda/version.txt CUDA Version 8.0.61
Use the following command to verify and note the version of the installed driver. In the demonstration server, the driver version is 396.15, as shown in the following sample output.
$ cat /proc/driver/nvidia/version NVRM version: NVIDIA UNIX ppc64le Kernel Module 396.15 Thu Mar 22 18:28:48 PDT 2018 GCC version: gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.2)
Note that some environments might have the driver and toolkit installed, but not activated. Use the following commands to activate an already installed driver and toolkit.
# check to use the correct NVIDIA lib version for your environment export PATH="/usr/local/cuda/bin:/usr/lib/nvidia-396/bin/:$PATH" export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64
Finally, verify that the NVIDIA toolkit version matches the CUDA version.
$ nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2016 NVIDIA Corporation Built on Tue_Jan_10_13:28:28_CST_2017 Cuda compilation tools, release 8.0, V8.0.61
Continue with the next steps if you still need to install or update the NVIDIA toolkit or driver.
2. Install the CUDA repository for Ubuntu
Before installing CUDA, you must update the Ubuntu repository to include CUDA.
# Install CUDA repository for ubuntu echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/ppc64el /" | sudo tee /etc/apt/sources.list.d/cuda.list curl -L https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/ppc64el/7fa2af80.pub | sudo apt-key add - sudo apt-get update
3. Install CUDA
Use the following code to install CUDA in environments where this is the first installation of CUDA.
# The following line will install the latest version of CUDA. sudo apt-get install cuda # Specific version of CUDA can also be defined if needed. For example, the following line will install cuda-8-0. sudo apt-get install cuda-8-0
You might have to restart the system after installing the CUDA toolkit.
4. Install the cuDNN library
Compiling TensorFlow requires the NVIDIA cuDNN library, which is a GPU-accelerated library of primitives for deep neural networks.
Verify whether cuDNN is already installed by checking for the libcudnn* files. If cuDNN is installed, note the installed version(s).
sudo apt list --installed | grep libcudnn* or ls /usr/local/cuda/lib64/libcudnn* or ls /usr/lib/powerpc64le-linux-gnu/libcudnn*
If cuDNN is not installed, follow the instruction below to install it. On Ubuntu systems, cuDNN packages are provided as Ubuntu repository hosted by NVIDIA. First, we need to add the cuDNN library Ubuntu repository to the apt sources:
echo "deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/ppc64el /" | sudo tee /etc/apt/sources.list.d/cudnn.list curl -L https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/ppc64el/7fa2af80.pub | sudo apt-key add - sudo apt-get update
Next, choose the correct version of the libcudnn library, which depends on the installed CUDA version. In this tutorial, we assume that you’ll use libcudnn6. If you use libcudnn7 or libcudnn5, modify the name in the following commands. Note that libcudnn5 and libcudnn6 are only supported for CUDA 8.0 on POWER systems.
When libcudnn6 or libcudnn5 is required, run the following command.
sudo apt-get install libcudnn6 libcudnn6-dev
The default CUDA version for libcudnn7 is 9.0, which is why the correct library version must be specified. To list the available cuDNN library versions, run following command.
apt-cache policy libcudnn7
A list with available versions is displayed.
apt-cache policy libcudnn7 libcudnn7: Installed: (none) Candidate: 126.96.36.199-1+cuda9.0 Version table: 188.8.131.52-1+cuda9.0 500 500 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/ppc64el Packages 184.108.40.206-1+cuda8.0 500 500 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/ppc64el Packages 220.127.116.11-1+cuda8.0 500 500 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/ppc64el Packages 18.104.22.168-1+cuda8.0 500 500 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/ppc64el Packages
Select the correct version from the list. We use the latest version for cuda 8.0 = 22.214.171.124-1+cuda8.0. I
Then, install libcudnn7 with the specific version:
sudo apt-get install libcudnn7=126.96.36.199-1+cuda8.0 libcudnn7-dev=188.8.131.52-1+cuda8.0
In this tutorial, you learned how to verify and install the NVIDIA CUDA toolkit, which gives you a development environment to create high-performance GPU-accelerated applications.