UPDATE: After publishing this article, NVIDIA updated the nvidia-container-runtime-hook to version 1.4.0. There is now no longer a need to manually create an OCI config file in /usr/libexec/oci/hooks.d. The install steps below have been modified to account for this.

For any user upgrading from nvidia-container-runtime-hook 1.3.0 to 1.4.0, you MUST Remove your copy of the oci config file in /usr/libexec/oci/hooks.d/…otherwise you’ll receive an error that looks like

/usr/bin/docker-current: Error response from daemon: oci runtime error: container_linux.go:247: starting container process caused "process_linux.go:339: running prestart hook 2 caused \"error running hook: exit status 1, stdout: , stderr: exec command: [/usr/bin/nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig --device=all --compute --compat32 --graphics --utility --video --display --require=cuda>=10.0 --pid=123090 /var/lib/docker/overlay2/16d288247557cf53d2e938f3ad092574e323592cf6682e838b59db6da44210ae/merged]\\nnvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/16d288247557cf53d2e938f3ad092574e323592cf6682e838b59db6da44210ae/merged/run/nvidia-persistenced/socket: no such device or address\\n\"".

Recently PowerAI 1.5.3 released a new version of it’s deep learning software on Docker Hub (https://hub.docker.com/r/ibmcom/powerai/). While running deep learning frameworks (Tensorflow, PyTorch, Caffe, etc) in a containerized environment has a lot of advantages, however, getting nvidia-docker installed and working correctly can be a source of frustration. This is especially evident when trying to work in a Red Hat Enterprise Linux (RHEL) environment, where the RHEL shipped version of Docker is slightly different than the upstream Docker editions. In this article, I'm going to walk through why RHEL's Docker is different, why we can't just install nvidia-docker 2.0 in this environment, and why, that isn't really a bad thing.

NOTE: For those of you that aren't interested in the background, and just want to install nvidia-docker2 on RHEL, you can jump to the section here

Redhat vs Upstream Docker

Most Linux distributions include a Docker package in their upstream repositories. For non-Red Hat distributions, a version of Docker is chosen, tested, and shipped with minimal changes, while upstream Docker development continues along with regular releases shipped as docker-ce. This differs, however, when you get to RHEL. Since RHEL requires subscriptions, entitlements, etc. they have more requirements for running Red Hat in a containerized environment. To accommodate these differences, Redhat has provided a modified Docker container runtime or runc (named docker-runc). We will discuss two of the additions to this container runtime before moving on.

1. Access to Redhat specific Docker Registries

If you're using upstream Docker packages, docker-ce or docker-ee on RHEL, and want to build RHEL based images, you'd have to either produce your own RHEL base, or use CentOS. With the RHEL version of Docker, you have access to the full list of images Redhat provides (https://access.redhat.com/containers/)

2. Passing the host server's entitlement credentials to a container at runtime.

The second addition, and arguable more important, is the ability to have your host machine's entitlement passed along to any Red Hat-based containers. This is a huge help when you have containers that need to periodically install packages. With the automatic entitlement support, there's no need to manually configure repository links, or write scripts to entitle the container.

Installing Docker for POWER on RHEL 7

Before we move on to nvidia-docker, lets cover how to install RHEL's flavor of Docker. You need to first enable the extras repo, and after that it's as simple as installing the docker package.

sudo yum-config-manager --enable extras
sudo subscription-manager repos --enable=rhel-7-for-power-le-extras-rpms
sudo yum makecache fast
sudo yum install -y docker

nvidia-docker 1.0 vs nvidia-docker 2.0

Now that you have the correct version of Docker installed, I'll give you an abridged rundown of what nvidia-docker provides, and the difference between nvidia-docker and nvidia-docker2. You can get more information about nvidia-docker from their github page (https://github.com/NVIDIA/nvidia-docker).

Initially, nvidia-docker was created to address a critical need in the container world. Namely, the ability to share a kernel module and devices with a running container. As most folks know, containers are glorified userspaces running on the host kernel. This means no matter what you're running in your container, it's still dependent on what's available in the host kernel. Including kernel modules. In our case we're interested in the NVIDIA module that provides communication to the GPU's on the system. While you can "install" the drivers in a container to get access to a GPU, you run into problems if the host and container versions of the driver ever differ. This makes managing containers almost impossible. On a same note, devices aren't passed into containers by default. This is done with intention, and any needed device can be manually added to a container via the --device argument. Since NVIDIA GPUs are all named the same(/dev/nvidia0, /dev/nvidia1, etc) It became busywork to constantly add them on a docker run statement. So nvidia-docker set out to solve both of these problems without requiring too much upfront work from an end user.

nvidia-docker 1.0: The first pass

The original nvidia-docker package (no longer maintained) accomplished it's task using a combination of a Docker Plugin (https://docs.docker.com/engine/extend/plugin_api/) and a Docker volume. All relevant libraries and binaries needed from the NVIDIA driver were copied over to the Docker volume, and the nvidia-docker plugin was used to mount both the volume, and all necessary NVIDIA devices to a container when executed. While this method worked, it had a few drawbacks. First, every time a driver was updated, nvidia-docker would have to create a new volume and copy all of the necessary files over. Plus because they were copies, any changes to host files weren't reflected in the container. The second problem, was the plugin often collided with other plugins for additional tools such as Kubernetes. This prompted a rewrite, and nvidia-docker2 was born.

nvidia-docker 2.0: Where we are now

nvidia-docker2 decided to do away with plugins, and volumes, opting instead, to utilize OCI runtime hooks to add the necessary NVIDIA libraries and devices. (https://github.com/opencontainers/runtime-spec/blob/master/config.md#prestart).

The nvidia-docker project was split into three repositories. libnvidia, nvidia-container-runtime, and the original nvidia-docker.

libnvidia: https://github.com/NVIDIA/libnvidia-container

libnvidia is the lowest level, and provides a nvidia-container-cli binary that allows for the manipulation of NVIDIA GPU containers. While it's useful to know this exists, it's not necessary to invoke the cli by hand. The nvidia-container-runtime-hook will manage
this for you.

nvidia-container-runtime: https://github.com/NVIDIA/nvidia-container-runtime

The nvidia-container-runtime repository contains the main piece of nvidia-docker2's code. It provides two features. First it contains the pre-start hook, nvidia-container-runtime-hook. This hook will utilize the nvidia-container-cli and setup a container with the necessary NVIDIA drivers and GPUs. The second piece is a packages named nvidia-container-runtime which provides a new runtime named nvidia. This runtime is essentially the same as runc, with a call to the prestart hook linked in.

nvidia-docker: https://github.com/NVIDIA/nvidia-docker

nvidia-docker2 is the final package, and at this point is just a wrapper that sets up the nvidia runtime to be the default when invoking docker commands.

Now we can get to the entire purpose of this article. RHEL has made many changes to their version of Docker including a new runc named docker-runc, because of this, we are not able to install the nvidia runtime and, by extension, the nvidia-docker2 package. But this really isn't a set back at all. As I've alluded to above, the key to nvidia-docker2's functionality isn't the nvidia-docker binary, or even the nvidia runtime. It's the OCI runtime hook support, and despite RHEL's use of a custom runc, it supports the use of these hooks.

Installing nvidia-docker 2.0 on RHEL

With all of the background into nvidia-docker 2.0, I feel we have enough to dive right into enabling NVIDIA's runtime hook directly. We won't be installing nvidia-docker2, or the nvidia-container-runtime, but we will still be installing the key features that make up nvidia-docker 2.0's functionality.

The install steps are also listed on NVIDIA’s nvidia-docker github page (https://github.com/NVIDIA/nvidia-docker#centos-7-docker-rhel-7475-docker). The only piece not accounted for are users of SELINUX. We will get to that in the next section.

1. Remove nvidia-docker 1.0
2. Enable access to the nvidia-container-runtime repository
3. Install nvidia-container-runtime-hook
4. Add hook to path that OCI understands
5. (Optional) Change SELINUX file permisisons for nvidia devices.

# 1. If you have nvidia-docker 1.0 installed: we need to remove it and all existing GPU containers
docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
sudo yum remove nvidia-docker

# 2. Add the package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-container-runtime/$distribution/nvidia-container-runtime.repo | \
  sudo tee /etc/yum.repos.d/nvidia-container-runtime.repo

# 3. Install the nvidia runtime hook
sudo yum install -y nvidia-container-runtime-hook

# NOTE:  Step 4 is only needed if you're using the older nvidia-container-runtime-hook-1.3.0  The default(1.4.0) now includes this file
# 4. Add hook to OCI path
#sudo mkdir -p /usr/libexec/oci/hooks.d

#echo -e '#!/bin/sh\nPATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" exec nvidia-container-runtime-hook "$@"' | \
#  sudo tee /usr/libexec/oci/hooks.d/nvidia

#sudo chmod +x /usr/libexec/oci/hooks.d/nvidia

# 5. Adjust SELINUX Permissions
sudo chcon -t container_file_t  /dev/nvidia*

After you've followed the steps above, the nvidia-container-runtime-hook will be invoked whenever you issue the 'docker run' command. The hook will look for specific NVIDIA keywords specified when images were built and act accordingly on them, such as NVIDIA_VISIBLE_DEVICES and NVIDIA_DRIVER_CAPABILITIES. You can get the full list of keywords here (https://github.com/NVIDIA/nvidia-container-runtime#environment-variables-oci-spec).

To test out this functionality, you can run a simple nvidia-smi container.

#Test nvidia-smi capabilities using default cuda container
docker run --rm nvidia/cuda-ppc64le nvidia-smi

SELinux considerations

If you're running with SELinux in Enforcing mode, you will have to take a few extra steps to use nvidia-docker2. This is step is necessary regardless of whether or not you're using RHEL's version of Docker, or upstream. As you may have noticed in the steps above, there's a final chcon command which is run before starting up containers. Since the physical NVIDIA devices are passed through to a running container at runtime, their file context(fcontext) will determine if the Docker service has permission to access it. By default the /dev/nvidiax devices have a fcontext of xserver_misc_device_t which, by default, does not grant permission to docker. While, there are many fcontext options that will grant containers access, we've chosen to reassign the devices to container_file_t. You can do this manually with the chcon command used in the above example, or edit your SELinux configuration to specifically target those devices and alter their file contexts. You can reference this RHEL article that describes more on how to make fcontext changes permanent (https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/security-enhanced_linux/sect-security-enhanced_linux-selinux_contexts_labeling_files-persistent_changes_semanage_fcontext)

After reading this article, I hope you now have a more comfortable understanding of nvidia-docker2, RHEL's version of Docker, and how they interact with each other. It's easy to write off RHEL's Docker when the nvidia-docker2 package won't install. In the previous paragraphs it should be clear that the meat of the new nvidia-docker wasn't the nvidia-docker2 wrapper, but rather it's OCI runtime hook, which RHEL actually supports natively, while it's upstream Docker that needs the additional container runtime modifications to support. Please let me know if you have any questions, or need clarification on any of the topics discussed above!

Join The Discussion

Your email address will not be published. Required fields are marked *