IBM Data Science Experience (DSX) Local and PowerAI are enterprise software offerings from IBM for data scientists, built with open source components and transparently accelerated with specialized hardware components such as GPUs in IBM Power Systems S822 LC HPC with GPU acceleration. In an earlier blog, I outlined some of the motivations and getting started steps for using DSX on  different types of IBM POWER systems such as the IBM Power Systems S822LC BigData . DSX is built on Docker and Kubernetes and is also available as part of IBM’s enterprise private cloud offering called IBM Cloud Private (ICP). In this blog, we will outline steps to first install ICP and then install DSX on ICP on POWER Systems

Install IBM Cloud Private (ICP) on POWER systems

In order to install ICP, we used the steps from this document with some minor modifications. We had an environment with 3 baremetal machines, we decided to have a simple ICP installation with one master. The steps:

  1.  Designate one of three machines as the master, we logged in to the master node
  2. Downloaded ICP-ee image tar package and copied it to all nodes:
     wget http://pokgsa.ibm.com/projects/i/icp-2.1.0.1/ibm-cloud-private-ppc64le-2.1.0.1.tar.gz 
    
  3. Setup password-less ssh to all nodes.
    This step should be done by replacing ssh_file_path with the id you want to use to ssh to all other nodes and replace hostname with the hostname or IP of each machine. We did this for every node in the cluster:

    ssh-copy-id -i ssh_file_path root@hostname
  4. Installed Docker.
    The machines had RHEL 7.2 OS, so we used docker rpms  downloaded from here. Alternatively, ICP can also be installed on Ubuntu 16.04 following instructions from here. These machines had a small root directory, but another larger directory to store data. I created a symbolic link to such directory for the docker directory. In case if you have enough space in root directory (I’d recommend at least 200Gb), just skip symbolic link creation:

    ln -s /data1/docker /var/lib/docker 
    wget http://ftp.unicamp.br/pub/ppc64el/rhel/7/docker-ppc64el/container-selinux-2.9-4.el7.noarch.rpm http://ftp.unicamp.br/pub/ppc64el/rhel/7/docker-ppc64el/docker-ce-17.09.0.ce-1.el7.centos.ppc64le.rpm 
    rpm -ivh container-selinux-2.9-4.el7.noarch.rpm 
    rpm -ivh docker-ce-17.09.0.ce-1.el7.centos.ppc64le.rpm
    • As Docker storage engine we used overlay and didn’t have problems with space errors. However if the storage engine is devicemapper, we would recommend to put specific dm.basesize option value
      in /etc/docker/daemon.json file:
      {“storage-opts”: [“dm.basesize=20G”]}
      `
  5. Set the vm.max_map_count:
    echo "vm.max_map_count=262144" | tee -a /etc/sysctl.conf
  6. Installed Python 
    yum install python python-pip -y
    
    For Ubuntu the command is
    apt-get install python python-pip -y 
    
  7. Extract the image bundle and load to docker on every machine in the cluster
    tar -xf ibm-cloud-private-ppc64le-2.1.0.1.tar.gz -O | docker load
  8. Create working directory for ICP
    mkdir /opt/icp2.1.0.1
    cd /opt/icp2.1.0.1
  9. Extract config files
    docker run -v $(pwd):/data -e LICENSE=accept ibmcom/icp-inception-ppc64le:2.1.0.1-ee cp -r cluster /data
    cd cluster
  10. Configured the hosts file in the cluster directory with ip addresses of the nodes. We used one node as master, 3 nodes as workers and one node as proxy. The same node can be master, worker and proxy at the same time. Because of small root directory, we also updated config.yaml with extra arg for kubelet to set another root dir. Skip it if you have large root directory

    … 
    kubelet_extra_args: ["--fail-swap-on=false","--root-dir=/data1/kubelet"] 
    …
  11. Selected the correct ssh_key use the key you ssh-copy-id to all the nodes:
    cp /root/.ssh/id_rsa ssh_key
  12. Ran the command to install ICP
    docker run --net=host -t -e LICENSE=accept -v $(pwd):/installer/cluster ibmcom/icp-inception-ppc64le:2.1.0.1-ee install
  13. To add GPU scheduling to the ICP cluster when installing on a GPU enabled system, we’ve added this script provided by NVIDIA modprobe.sh  on the GPU node(s) to the /etc/systemd/system/kubelet.service
    [Unit] 
    Description=Kubelet Service 
    Documentation=https://github.com/kubernetes/kubernetes 
    
    [Service] 
    EnvironmentFile=-/etc/environment 
    ExecPreStart=/root/modprobe.sh 
    ExecStart=/opt/kubernetes/hyperkube kubelet \ 
    --feature-gates Accelerators=true,PersistentLocalVolumes=true,ExperimentalCriticalPodAnnotation=true \ 
    --allow-privileged=true \ 
    --docker-disable-shared-pid \ 
    --require-kubeconfig \ 
    --kubeconfig=/var/lib/kubelet/kubelet-config \ 
    --read-only-port=0 \ 
    --client-ca-file=/var/lib/kubelet/ca.crt \ 
    --authentication-token-webhook \ 
    --anonymous-auth=false \ 
    --network-plugin=cni \ 
    --pod-manifest-path=/etc/cfc/pods \ 
    --hostname-override=9.37.251.165 \ 
    --node-ip=9.37.251.165 \ 
    --cluster-dns=10.0.0.10 \ 
    --cluster-domain=cluster.local \ 
    --pod-infra-container-image=ibmcom/pause-ppc64le:3.0 \ 
    --cgroup-driver=cgroupfs \ 
    --fail-swap-on=false 
    
    Restart=always 
    RestartSec=10 
    
    [Install] 
    WantedBy=multi-user.target
    
  14. Restarted kubelet on the GPU node(s)
    systemctl restart kubelet 

 

Install IBM Data Science Experience Local (DSX) on IBM Cloud Private (ICP) on POWER systems

 In order to install DSX, we used the  general steps documented here with some modifications.  An archive including the Helm charts and docker images for DSX on ICP on Power is available as IBM Passport Advantage archive (IBM’ers can use this link), search for part number CNQ9NEN. In order to install the helm chart, we had to use special tool (IBM Cloud cli). At this moment IBM Cloud cli is available only for Intel x86_64 architecture. So the first 7 setups were performed on Intel x86_64 linux machine (e.g. a laptop or a virtual machine with at least 50GB of free disk space).

  1. Installed IBM Cloud cli
    wget https://clis.ng.bluemix.net/download/bluemix-cli/0.6.4/linux64
    tar -zxvf linux64cd Bluemix_CLI./install_bluemix_cli
    
  2. Assuming that the IBM Cloud Private installation was already done following steps in the prior section, in IBM Cloud Private’s administration console (https://<master node>:8443/console, default user id: admin, default password: admin), went to Menu > Tools > Command Line > Cloud Private CLI and downloaded the plugin file to an Intel x86 client VM or laptop. Installed the ICP plugin.
    bx plugin install /<path_to_installer>/<cli_file_name>
  3. Copied /etc/docker/certs.d/mycluster.icp:8500/ca.crt from ICP master node to /etc/docker/certs.d/mycluster.icp:8500/ca.crt the x64 machine
  4. Updated /etc/hosts with
     …
     < master_ip_address > mycluster.icp
     …
  5. Ran docker login with admin/admin

    docker login mycluster.icp:8500
  6. Logged in to the cluster
    bx pr login -a https://<master_ip_address>:8443 –skip-ssl-validation
  7. Pushed the downloaded DSX archive to the ICP

    bx pr load-ppa-archive --archive ibm-dsx-local-ppc64le-icp-2.1.0.1.tgz 
  8. Now the Helm chart is created. We performed the following steps in the ICP adminstration console (https://<master node>:8443/console user id: admin, default password: admin). 
  9. Went to Manage > Helm Repositories and click Sync Repositories. 
  10. Went to Catalog > Helm Charts and verify that the ibm-dsx-prod-ppc64le chart now displays. 
  11. In the IBM Cloud Private App Center, selected the user and click Configure Client to configure kubectl.  
  12. Used global-scope-images.sh script to change images scope to global 
     chmod +x global-scope-images.sh 
    ./global-scope-images.sh 
    # Ignore the following warning if you see it: Warning: kubectl apply should be used on resource created by either kubectl create --save-config or kubectl apply
  13. Went to Manage > Namespaces and created the following four namespaces to deploy DSX Local on:
    sysibmadm-data
    sysibm-adm
    dsxl-ml
    ibm-private-cloud
  14. Ran the following command to enable some permissions required to spawn notebooks and other components post-deployment:
    kubectl create rolebinding ibm-private-cloud-admin-binding --clusterrole=admin --user="system:serviceaccount:ibm-private-cloud:default" --namespace=ibm-private-cloud
  15. If you use dynamic provisioning with GlusterFS, simply ensure that the appropriate storage class exists. This can be checked with:
    kubectl get storageclasses | grep glusterfs

    If nothing shows up, then consult your cluster administrator about the availability of GlusterFS. 

  16. We used NFS storage. As NFS server we chose ICP master node. We installed and ran NFS packages on all ICP nodes  
    yum install -y nfs-utils nfs-utils-lib
    systemctl start nfs 
    
  17. Created shared NFS directory on the ICP master node 
    mkdir /nfsshare
  18. For NFS, we created four directories in the NFS mount path:
    cloudant
    redis
    spark-metrics 
    user-home
  19. Made an entry in “/etc/exports” on master node and restarted the services
    …
    /nfsshare *(rw,sync,no_subtree_check,no_root_squash)
    …
    systemctl restart nfs
  20. Used dsx-volumes.yml file and update server and path values for each volume, for instance
    nfs:
    server: <master_ip_address>
    path: /nfsshare/user-home 
  21. Created all volumes with this yaml file: dsx-volumes.yml

    kubectl --validate=false create -f dsx-volumes.yml 
  22. Went to Catalog > Helm Charts, select ibm-dsx-prod-ppc64le, and click Configure. There are a lot of parameters that can be changed for installation there. I used the default values. 
  23. Installed 4 releases based on the ibm-dsx-prod-ppc64le chart. After each installation, we checked if all pods for the release were running before next installation. 
    Release Name  Target Namespace 
    dsxns1  sysibmadm-data 
    dsxns2  sysibm-adm 
    dsxns3  dsxl-ml 
    dsxns4  ibm-private-cloud 
  24. Check for dsxns1 release

    kubectl get pods --namespace=sysibmadm-data 
  25. Check for dsxns2 release
    kubectl get pods --namespace=sysibm-adm 
  26. Check for dsxns3 release
    kubectl get pods --namespace=dsxl-ml
  27.  Check for dsxns4 release
    kubectl get pods --namespace=ibm-private-cloud 

 When all the pods were running, DSX on ICP was up and running.

Start using DSX Local 

In a web browser, we went to the URL https://MASTER_NODE_IP:31843/ (use your own MASTER_NODE_IP address) to access the DSX Local client and created some test notebooks. Default credentials were admin/password 

 See IBM Data Science Experience Local for more documentation on how to use DSX Local. 

Acknowledgements

The author would like to thank Yulia Gaponenko for outlining the detailed steps in this blog.

5 comments on"Deploy IBM Data Science Experience Local with PowerAI on IBM Cloud Private"

  1. I am going to setup the ICP on Power at my customer site. However, they do not have any Internet connection, and I would like to know what Power Linux packages that I need to install prior for the ICP installation.

  2. Hi I.P.
    We have Power9 (AC922) with PowerAI installed. The Question is on how to include this node in an ICP cluster to be available to end user to use TensorFlow as a service ?

Join The Discussion

Your email address will not be published. Required fields are marked *