Install nvidia-docker 2.0 from NVIDIA's repository
Use the following instructions from NVIDIA: https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)
You don’t need to pin your version of docker when installing.
Be sure to back up your docker engine config and service files before installing nvidia-docker 2.0. The nvidia-docker package will add nvidia-docker as a runtime by modifying the config file.
Verify that the runtime has been added correctly:
$ docker info | grep Runtime
"Runtimes: nvidia runc"
Set default runtime to nvidia
Edit your service file by modifying the following flag: –default-runtime=nvidia. This could be in your main service file, or a drop-in file, e.g. /etc/systemd/system/docker.service.d/override.conf.
Verify that the default runtime has been set to nvidia:
$ docker info | grep Default
Default Runtime: nvidia
Note: There are multiple ways to configure this. For other options, refer to nvidia’s documentation here: https://github.com/nvidia/nvidia-container-runtime#docker-engine-setup
Configure kubernetes plugin
Follow the instructions from the kubernetes project here: https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#v1-8-onwards
If you are using a Kubernetes version earlier than 1.10, , set –feature-gates=”DevicePlugins=true” (https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#deploying-nvidia-gpu-device-plugin).
When deploying, you’ll use https://raw.githubusercontent.com/nvidia/k8s-device-plugin/v1.10/nvidia-device-plugin.yml.
Create the device plugin daemonset, so that each pod in your cluster has access to the host GPUs:
$ kubectl create -f nvidia-device-plugin.yml
daemonset "nvidia-device-plugin-daemonset" created
Deploy pod with exposed GPU
Using https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus first build your image:
$ curl https://raw.githubusercontent.com/kubernetes/kubernetes/v1.7.11/test/images/nvidia-cuda/Dockerfile -o nvidia-cuda-vector_Dockerfile
Edit the FROM line to FROM nvidia/cuda-ppc64le:9.2-devel-ubuntu16.04
$ docker build -f nvidia-cuda-vector_Dockerfile -t cuda-vector-add:v0.1 .
Deploy your application to your pod:
$ kubectl create -f cuda-vector-pod.yml
pod "cuda-vector-add" created
Find your nvidia-device-plugin-deamonset container on your node and confirm that things loaded properly, e.g.:
$ docker logs k8s_nvidia-device-plugin-ctr_nvidia-device-plugin-daemonset-bqdnk_kube-system_d447ec3b-5ddd-11e8-94a4-98be9405a2a4_0
2018/05/22 16:33:24 Loading NVML
2018/05/22 16:33:24 Fetching devices.
2018/05/22 16:33:24 Starting FS watcher.
2018/05/22 16:33:24 Starting OS watcher.
2018/05/22 16:33:24 Starting to serve on /var/lib/kubelet/device-plugins/nvidia.sock
2018/05/22 16:33:24 Registered device plugin with Kubelet
View the logs of your test job, for example:
$ kubectl logs cuda-vector-add
[Vector addition of 50000 elements]Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
For more on how you can control how GPUs are exposed inside containers, see https://github.com/nvidia/nvidia-container-runtime#environment-variables-oci-spec. These environment variables are set in the images provided by NVIDIA.