IBM PowerAI Enterprise is a powerful platform that enables data scientists with ready-to-use Deep Learning frameworks, hyper-parameter search and optimization for feature engineering, resource utilization optimizations for model training, and several new and compelling features to accelerate the performance of the training job. You can deploy IBM PowerAI Enterprise platform on your on-premise data center managed by IBM Cloud Private Kubernetes environment.

Follow these steps to prepare your POWER8 or POWER9 system for IBM Cloud Private and PowerAI Enterprise.

Enable GPUs

For POWER8 Systems (S822LC-hpc Minsky), follow these steps on each GPU node:

  1. Clean up CUDA libraries from any prior installations by running these commands, then reboot the system:

    yum list installed | grep -i cuda
    yum remove -y cuda-license*
    yum remove dkms.noarch
    yum remove epel-release
    rpm -qa | grep -e nvidia -e cuda | xargs yum remove -y
    yum remove nvidia-kmod*

  2. Verify that the kernel package versions match by running these commands:

    rpm -qa | grep kernel
    uname -r

    If the versions don’t match, update the kernel headers and devel libraries by running this command, then reboot the system:

    yum update kernel kernel-devel kernel-headers

  3. Install the NVIDIA driver library, then reboot the system. See NVIDIA Driver download. Example:
    rpm -ivh nvidia-driver-local-repo-rhel7-410.72-1.0-1.ppc64le.rpm
  4. Install the CUDA driver library on each GPU node, then reboot the system:
    yum install cuda-driver
  5. Download & install the cudaInit_ppc64le file from the IBM developerWorks community to your system. See cudaInit_ppc64le.
    ./cudaInit_ppc64le

Note: After rebooting the system, you may have to re-run this script after the system comes back up. 6. To verify that GPU is enabled, run this command. The GPUs should show up:

nvidia-smi

On a POWER9 AC922 Newells system, follow the instructions in the “IBM POWER9 specific udev rules” section of the System Setup topic.

Installing Docker

  1. Install Docker version 1.13.1 or higher, published by Red Hat Enterprise Linux (RHEL). If you have a small root directory, you may want to redirect the /var/lib/docker to an alternate directory or to a disk with more space prior to installation.

    To install RHEL docker, run:

    yum install docker
    systemctl start docker
    systemclt status docker

  2. Configure the storage driver for the Docker engine to be device_mapper in direct_lvm mode. For more information, see Use the Device Mapper storage driver.

Installing IBM Cloud Private

  1. Designate one of systems as the master and log in to the master node.
  2. Download the latest version of the IBM Cloud Private-ee image tar package from and copy it to the master node.
  3. Set up password-less SSH on all nodes and ensure that all nodes have the same time zone setting:
    • Generate keys by using this command:

      ssh-keygen -t rsa -f /root/.ssh/id_rsa -P ''

    • Run the command:

      ssh-copy-id -i _ssh_file_path_ root@_hostname_

    Where ssh_file_path is the ID you want to use to SSH to all other nodes and hostname is the host name or IP address of each of the systems. Do this for every node in the cluster.

  4. Ensure that all default ports are open but are not in use. For more information, see the IBM Cloud Private documentation.
  5. Remove the /var/lib/mysql directory if your server has one.
  6. Set the vm.max_map_count on all nodes, if not already set, by running this command: echo "vm.max_map_count=262144" | tee -a /etc/sysctl.conf
  7. Install Python:

    yum install python python-pip -y

  8. By default, IBM Cloud Private installation requires the firewall to be disabled. Stop the firewalld service by running:

    systemctl stop 'firewalld'

  9. Load the image bundle to Docker on the master node in the cluster, navigate to the download directory and run:

    tar -xf ibm-cloud-private-ppc64le-3.1.1-ee.tar.gz -O | docker load

  10. Create the working directory for IBM Cloud Private:

    mkdir /opt/ibm-cloud-private-3.1.1
    cd /opt/ibm-cloud-private-3.1.1

  11. Extract the configuration files:

    docker run -v $(pwd):/data -e LICENSE=accept ibmcom/icp-inception-ppc64le:3.1.1-ee cp -r cluster /data

    cd cluster

  12. Copy the image to the images folder under the cluster directory:

    mkdir /opt/ibm-cloud-private-3.1.1/cluster/images

    cp download directory/ibm-cloud-private-ppc64le-3.1.1.tar.gz /opt/ibm-cloud-private-3.1.1/cluster/images

  13. Configure the hosts file in the cluster directory with the IP addresses of the nodes. We used one node as the master, three nodes as workers, one node as the proxy, and one node as the management node. If you have a small root directory, you may want to update config.yaml with an extra argument for the Kubelet to set another root directory. Skip this step if you have large root directory:

    kubelet_extra_args: ["--fail-swap-on=false","--root-dir=/data1/kubelet"]

    If your setup has many vCPUs, update config.yaml to prevent known Out-of-memory errors.

  14. Configure the /opt/ibm-cloud-private-3.1.1/cluster/hosts file with the IP addresses of the master, workers, proxy, and management nodes. If possible, don’t use a single system as the master, proxy, and management nodes. If a proxy node is specified, IBM Cloud Private asks about the proxy_vip, an additional IP address different from the others:
    For a three node IBM Cloud Private cluster, the hosts file may look like the following:

    $ cat hosts

    [master]
    worker_node_1_IP_address
    [worker]
    worker_node_1_IP_address
    worker_node_2_IP_address
    worker_node_3_IP_address
    [proxy]
    worker_node_3_IP_address
    [management]
    worker_node_2_IP_address

  15. Select the correct ssh_key. Use the key that you specified as ssh-copy-id on all the nodes (step 3 above):

    cp /root/.ssh/id_rsa ssh_key

  16. To install socat on the master node, run:

    wget http://www.dest-unreach.org/socat/download/socat-1.7.3.2.tar.gz
    tar zxvf socat-1.7.3.2.tar.gz
    cd socat-1.7.3.2/
    yum install gcc* (if the C compiler does not already exist)
    ./configure
    make
    make install

  17. Run this command to install IBM Cloud Private:

    nohup docker run --net=host -t -e LICENSE=accept -v $(pwd):/installer/cluster ibmcom/icp-inception-ppc64le:3.1.1-ee install &

  18. Once complete, validate the installation by opening the IBM Cloud Private GUI in a browser:

    https://_master node or Proxy IP address_:8443:/console/login

  19. To install kubectl on boot node, run:

    docker run -e LICENSE=accept --net=host -v /usr/local/bin:/data ibmcom/icp-inception-ppc64le:3.1.1-ee cp /usr/local/bin/kubectl /data

  20. Verify that all GPU nodes show the allocated number of GPUs by running the command below:

    kubectl describe nodes | tr -d '\000' | grep -e Hostname -e gpu

    If the results of the above command are not as expected, refer to the section ‘Enable GPU’ above to check on missing prerequisites.

  21. To add a new worker node to the IBM Cloud Private setup at a later time, prepare the worker node with an IP address such as worker_node_4_IP_address with a GPU as instructed earlier in this article. Run the following command on the IBM Cloud Private master node:
    cd /opt/opt/ibm-cloud-private-3.1.1/cluster

    vi hosts, add worker_node_4_IP_address under [worker] list

    Run the following command to add a worker node to a cluster: nohup docker run --net=host -t -e LICENSE=accept -v $(pwd):/installer/cluster ibmcom/icp-inception-ppc64le:3.1.1-ee install -l _ip address of new worker node_ &

Install PowerAI Enterprise

The PowerAI Enterprise for IBM Cloud Private install package can be downloaded from Passport Advantage.

  1. Copy the file /etc/docker/certs.d/mycluster.icp:8500/ca.crt from the IBM Cloud Private master node to /etc/docker/certs.d/mycluster.icp:8500/ca.crt on the machine you are performing the installer upload on (if it is differs from the system you are installing on). Restart Docker on the upload system. If you use the Mac OS, copy ~/.docker/certs.d/mycluster.icp:8500/ca.crt and restart Docker. For more information, see the Docker documentation. On an x86 system, before copying the certificates, create the directory mycluster.icp:8500 if it does not already exist.
  2. If not already done, update /etc/hosts with

    _icp master node ip_ mycluster.icp

  3. Access the IBM Cloud Private App Center by clicking on the icon on the top right hand corner. Select the user and click Configure Client to configure kubectl.
  4. Run Docker login with the credentials admin/admin:

    docker login mycluster.icp:8500

  5. Log in to the cluster:

    cloudctl login -a https://_IP address of the master node_:8443 --skip-ssl-validation

    When prompted, provide admin/admin as the login details, the cluster account, and the choice of name space.

  6. Push the downloaded PowerAI Enterprise archive to IBM Cloud Private.

    cloudctl catalog load-archive --archive powerai-enterprise-container-1.1.2_ppc64le.tgz

    This command may take several minutes to complete.

  7. The PowerAI Enterprise Docker image should now be visible on the IBM Cloud Private administration console. Go to Menu > Container Images > Images. Change the scope of the image to global.
  8. On the IBM Cloud Private administration console, go to Menu > Manage > Helm Repositories and click Sync Repositories.
  9. The PowerAI Enterprise Helm Chart should now be listed on the IBM Cloud Private administration console. Go to the catalog and search for ibm-powerai-enterprise-prod.
  10. Review the PowerAI Enterprise Helm Chart readme file for PowerAI Enterprise carefully. It documents prerequisites, requirements, and limitations of PowerAI Enterprise in IBM Cloud Private.
  11. Disable NFS version 4 on all nodes.
      Check for the current version by running:

      $ rpcinfo -u localhost nfs
      program 100003 version 3 ready and waiting
      program 100003 version 4 ready and waiting

      Disable NFS v4:

      vi /etc/sysconfig/nfs, update RPCNFSDARGS="--no-nfs-version 4", save file
      systemctl restart nfs-server.service
      systemctl status nfs-server.service

      Verify that NFS 4 is disabled successfully:

      $ rpcinfo -u localhost nfs
      program 100003 version 3 ready and waiting
      $ mount -t nfs -o vers=4,acl _ip of master node_:/root/nfsshare /mnt
      mount.nfs: Protocol not supported

  12. Create persistent volumes for a new deployment as instructed in PowerAI Enterprise Helm release Readme. To summarize, follow the steps below:
    1. Create a shared NFS directory /nfsshare on the IBM Cloud Private master node. If the NFS server is set up on another system, use that information in the persistent volume creation yaml. The below example uses /nfsshare as the NFS directory.
    2. For the first deployment, create three directories in the NFS mount path. For example:

      mkdir /nfsshare/dli-share /nfsshare/master /nfsshare/etcd

      Note:

      • One of the persistent volumes is mapped to etcd containers during the deployment and is shared between all deployments on a given IBM Cloud Private instance. Hence, this persistent volume needs to be created only once per IBM Cloud Private instance.
      • The other persistent volumes, in this example, dli-share and master, are unique for each deployment and must be created prior to each PowerAI Enterprise Helm chart deployment.
    3. Create a configuration file similar to the examples in this yaml file.
    4. Create persistent volumes by running the command below:

      kubectl create -f paie-pv.yaml

    5. Make an entry in “/etc/exports” on the master node and restart the services.

      /nfsshare *(rw,sync,no_subtree_check,no_root_squash)
      systemctl restart nfs

    6. Ensure the mount path is correct (verify 'showmount -e _master_ip_address_' against /etc/exports)
  13. Set up secrets and service accounts as described in section PodSecurityPolicy Requirements in the PowerAI Enterprise Helm release Readme. To summarize, follow the steps below:
    1. Run the following commands:
      cd _PowerAI Enterprise PPA downloaded location_
      tar -zxvf powerai-enterprise-container-1.1.2_ppc64le.tgz
      tar -zxvf charts/ibm-powerai-enterprise-prod-1.1.2.tgz
      cd _PowerAI Enterprise PPA downloaded location_/charts/ibm-powerai-enterprise-prod/ibm_cloud_pak/pak_extensions/prereqs
    2. Get base64 encrypting for credentials. In example below the credentials are ‘admin’ for both user and password:

      • echo -n admin:admin | base64
        YWRtaW46YWRtaW4=
      • cat config.json
        {"auths": {"mycluster.icp:8500": {"auth": "YWRtaW46YWRtaW4="}}}
      • cat config.json | base64
        eyJhdXRocyI6IHsibXljbHVzdGVyLmljcDo4NTAwIjogeyJhdXRoIjogIllXUnRhVzQ2WVdSdGFXND0ifX19Cg==
      • echo -n admin | base64
        YWRtaW4=
    3. Update the templates: secret_template.yaml, secret-helm-template.yaml, secret-imagecleaner-template.yaml, and serviceaccount_template.yaml with the Helm release name that you will provide when deploying and the base64 encrypted credential information from the previous step. Create the secrets and service accounts by running the commands below:

      kubectl create -f secret_template.yaml
      kubectl create -f secret-helm-template.yaml
      kubectl create -f secret-imagecleaner-template.yaml
      kubectl create -f serviceaccount_template.yaml

      Verify that the secrets and service accounts were created as desired:

      kubectl get secrets
      kubectl get serviceaccounts

  14. Go to the IBM Cloud Private console. On the PowerAI Enterprise Helm release Readme page, click Configure and enter the information for the Helm Release name and the Namespace fields.
  15. Click Install.
    • Provide a name for the release.
    • Choose an appropriate name space.
    • Accept the license agreement.
    • For credential configuration for the Helm chart, provide IBM Cloud Private administrator credentials (admin/admin).
    • Retain the default values for everything else and click Deploy.

    Note: The proxy for accessing the cluster management console is by default set to IngressProxy and the base port is set to 30745. The IngressProxy base port number and ASCD debug port is unique for each deployment.

  16. To check the deployment status, go to Menu > Workload > Helm Release or Deployment, search by the Helm release name. Its status should be Deployed.
  17. To access the PowerAI Enterprise user interface, complete these post installation configuration steps:
    • Configure your client DNS server to resolve the deployment name to any public IP address of the Kubernetes cluster. Configure the host mapping. For UNIX systems, this is in host/etc/hosts. For Windows systems, it is in /etc/hosts. In the following example, you would replace x.x.x.x with any public IP address of the Kubernetes cluster:
      `cat /etc/hosts`
      `_x.x.x.x_ deployment name`
    • Access the cluster by browser with the URL: https://deployment name:base-port/platform, where deployment name is the Helm release deployment name and base-port is the base port that is provided during the deployment.

      For example, for a Helm release name of powerai-enterprise and the default base port of 30745 (in step 15), the master pod is created with default prefix paiemaster and the URL to access is https://powerai-enterprise-paiemaster:30745/platform

      If you are using the default self-signed certificate, refer to steps 3 and onward in the topic Locating the cluster management console.

Remove the PowerAI Enterprise Helm release

To remove the PowerAI Enterprise Helm release, log in to the IBM Cloud Private console, and follow these steps:

  1. Go to Workload > Helm releases. Search for the PowerAI Enterprise Helm release and delete it.
    • If no other PowerAI Enterprise deployment exists, you can also delete the Helm releases related to etcd and imagecleaner.
    • If you created the Spark Instance Groups for this deployment, corresponding SIG Helm releases were created and must be deleted as well.
    • Alternatively, use commands below to delete the Helm release from the master node putty session:

      kubectl get deployment --namespace enterprise | sed -e 1d | awk '{print $1}' | xargs helm delete --purge --tls

  2. Go to Workload > Deployments and ensure that no related stale deployments remain. If any are found, delete them.
  3. Go to Platform > Storage and delete the volumes associated with this release.
  4. Search for and remove the chart secrets by running the commands below:

    kubectl get secrets
    kubectl get secrets -n _PowerAI Enterprise Helm release namespace_
    kubectl delete secret _secret name associated with the Chart_ -n _PowerAI Enterprise Helm release namespace_

  5. Go to Manage > Name spaces and remove any name spaces that were created. Alternatively, you can also delete them by running the below command on the master node:

    kubectl delete namespace enterprise

  6. Log in to the IBM Cloud Private master node and remove the mount directories associated with the release being deleted. Example:

    rm -rf /nfsshare/master
    rm -rf /nfsshare/dli-share

  7. If required, delete the containers and Docker images:

    docker rm _container_
    docker rmi _image_

Uninstall IBM Cloud Private

Follow the IBM Cloud Private documentation if you need to uninstall your IBM Cloud Private cluster. The following steps summarize the uninstall process:

  1. From your boot node, change to the cluster directory within your IBM Cloud Private installation directory:
    cd /_installation_directory_/cluster

    Run this command:

    nohup docker run --net=host -t -e LICENSE=accept -v $(pwd):/installer/cluster mycluster.icp:8500/ibmcom/icp-inception:3.1.1-ee uninstall &

  2. Reboot all nodes.
    To remove a single worker node, for example, worker_node_4_IP_address from the cluster, run following command:

    nohup docker run --net=host -t -e LICENSE=accept -v $(pwd):/installer/cluster ibmcom/icp-inception-ppc64le:3.1.1-ee uninstall -l worker_node_4_IP_address &

Acknowledgements

The author would like to thank Indrajit Poddar for his guidance and review; Anil Tallapragada, and Vibha Kulkarni for their hard work helping create this blog.

Join The Discussion

Your email address will not be published. Required fields are marked *