Installing IBM Cloud Pak for Data 3.5.3 on Red Hat OpenShift Container Platform 4.6 on IBM Power Systems Virtual Server

This tutorial is part of the Learning path: Deploying Red Hat OpenShift Container Platform 4.x on IBM Power Systems Virtual Servers.

Topics in “Exploring Red Hat OpenShift on Power Systems Virtual Server” Type
Deploying Acme Air microservices application on Red Hat OpenShift Container Platform Tutorial
Deploying a sample MongoDB geospatial application on Red Hat OpenShift Container Platform Tutorial
Enable continuous deployment using Red Hat OpenShift S2I and GitHub webhooks Tutorial
Installing IBM Cloud Pak for Data 3.5.3 on Red Hat OpenShift Container Platform 4.6 on IBM Power Systems Virtual Server Tutorial

Introduction

IBM® Cloud Pak® for Data unifies and simplifies the collection, organization, and analysis of data. Enterprises can turn data into insights through an integrated cloud-native architecture. IBM Cloud Pak for Data is extensible and can be customized to a client’s unique data and AI landscapes through an integrated catalog of IBM, open source, and third-party microservices add-on.

This tutorial shows how to perform an online installation of Cloud Pak for Data and some of the services that are needed to use the Cloud Pak for Data industry accelerators available at https://www.ibm.com/support/producthub/icpdata/docs/content/SSQNUZ_current/cpd/svc/industry-accel-svc.html.

Prerequisites

This tutorial assumes that you are familiar with the Red Hat® OpenShift® Container Platform environment on IBM Power Systems™ Virtual Server. It is assumed that you have it already installed, you have access to it, and have the credentials for the kubeadmin (OpenShift cluster administrator). You must be familiar with Linux® command line and have at least basic understanding of Red Hat OpenShift.

Also, you must have created a local repository on a persistent storage and have a Network File System (NFS) storage class where the NFS export has the no_root_squash and no_all_squash properties set.

Then, you would need to make sure that the clocks on the worker nodes are synchronized.

You need to have the wget and the oc clients already installed and on your PATH variable.

Estimated time

It is expected to take around 2 to 3 hours to complete the installation of IBM Cloud Pak for Data on IBM Power Systems Virtual Server. This lengthy duration is because we need to install the software from repositories on the internet.

Installing IBM Cloud Pak for Data

Perform the following steps to install IBM Cloud Pak for Data:

  1. Log in as the root user on the bastion node.

  2. Install the Linux screen utility that will maintain your session in case your internet connection drops and will make it recoverable when you reconnect (this is an optional step).

    yum install -y https://dl.fedoraproject.org/pub/epel/8/Everything/ppc64le/Packages/s/screen-4.6.2-10.el8.ppc64le.rpm
    
  3. Verify that the NFS export has the no_root_squash and no_all_squash properties set. Restart the NFS server if changed.

     cat /etc/exports
     # /export *(rw,sync,no_root_squash,no_all_squash)
    
     systemctl restart nfs-server
    
  4. Verify that the clocks on the OpenShift nodes are synchronized.

    for NODE in $(oc get nodes|grep Ready|awk '{print $1}');do echo "$NODE ------------";ssh core@$NODE "date -u";done
    for NODE in $(oc get nodes|grep Ready|awk '{print $1}');do echo "$NODE ------------";ssh core@$NODE "chronyc sources"; done
    
  5. Verify that the I/O performance of the NFS export meets the requirements. The value from the first dd command (disk latency) should be equal to or better than 2.5 MBps. The value of the second dd command (disk throughput) should be equal to or better than 209 MBps.

    BASTION_IP=$(nslookup $(hostname -s) | tail -n2 | head -1 | awk '{print $2}')
    NODE=$(oc get nodes | grep Ready | grep worker | head -1 | awk '{print $1}')
    
    cat <<EOF > /tmp/verify_disk.sh
    mkdir -p /mnt/export
    mount -t nfs ${BASTION_IP}:/export /mnt/export
    echo "Verifying disk latency of NFS share - should be equal or better than 2.5 MB/s"
    dd if=/dev/zero of=/mnt/export/testfile bs=4096 count=1000 oflag=dsync
    echo "Verifying disk throuhgput of NFS share - should be equal or better than 209 MB/s"
    dd if=/dev/zero of=/mnt/export/testfile bs=1G count=1 oflag=dsync
    rm /mnt/export/testfile; umount /mnt/export; rm -rf /mnt/export
    echo "Cf. https://www.ibm.com/support/knowledgecenter/SSQNUZ_3.5.0/cpd/plan/rhos-reqs.html#rhos-reqs__disk"
    echo "Done."
    EOF
    
    scp /tmp/verify_disk.sh core@${NODE}:/tmp
    ssh core@${NODE} "sudo sh /tmp/verify_disk.sh; rm /tmp/verify_disk.sh"
    rm /tmp/verify_disk.sh
    
  6. Create a new user (in this example, cp4d) on the bastion host to use in the installation process.

    useradd cp4d
    
  7. Change to the new user.

    su - cp4d
    
  8. Log in to your Kubernetes cluster using the kubeadmin login and password.

    oc login https://api.<ClusterName>.<Domain>:6443
    # Authentication required for https:// api.<ClusterName>.<Domain>:6443 (openshift)
    # Username: kubeadmin
    # Password:
    
  9. Expose the internal OpenShift image registry (if not done earlier).

    oc patch configs.imageregistry.operator.openshift.io/cluster --type merge -p '{"spec":{"defaultRoute":true}}'
    
  10. Use the kernel.yaml file to apply kernel tuning parameters.

    Download the kernel.yaml file

    oc apply -f kernel.yaml
    
  11. Use the smt_crio_slub.yaml file to make sure that the required OpenShift Container Platform configuration is applied.

    Download the smt_crio_slub yaml file

    oc apply -f smt_crio_slub.yaml
    
  12. Verify that smt_crio_slub.yaml changes have been applied. You need to wait until all worker nodes have been updated, that is until the status of the worker nodes shows: UPDATED=True, UPDATING=False, and DEGRADED=False. This could take up to 30 minutes as the worker nodes are being rebooted.

    oc get mcp
    
    # NAME   CONFIG   UPDATED UPDATING DEGRADED MACHI…COUNT READY…COUNT UPDATED…COUNT   DEGRADED…COUNT   AGE
    # master rende…   True    False    False    3           3           3               0                25d
    # worker rende…   True    False    False    3           3           3               0                25d
    
  13. Download the Cloud Pak for Data version installation utility from the public IBM GitHub repository.

    wget https://github.com/IBM/cpd-cli/releases/download/v3.5.2/cpd-cli-ppc64le-EE-3.5.2.tgz
    
  14. Extract the cpd-cli-ppc64le-EE-3.5.2.tgz package.

    tar -xvf cpd-cli-ppc64le-EE-3.5.2.tgz
    
  15. Using your preferred text editor, change the apikey entry in the repo.yaml file that was decompressed with the apikey you acquired with your IBM ID from the IBM container library.

    ---
    fileservers:
    -
     url: "https://raw.github.com/IBM/cloud-pak/master/repo/cpd/3.5"
     registry:
    -
     url: cp.icr.io/cp/cpd
     name: base-registry
     namespace: ""
     username: cp
     apikey: [Get you entitlement key here https://myibm.ibm.com/products-services/containerlibrary]
    
  16. Start the Linux screen utility (if you lose your connection to the internet you can resume the installation terminal using the screen -r command). This is an optional step.

    screen
    

    Note: You can opt to download the install_all.sh file which contains the commands covered for the following steps.

  17. Create a new project called zen.

    oc new-project zen
    
  18. Install Cloud Pak for Data Control Plane (Lite).

    ./cpd-cli adm --assembly lite --arch ppc64le --namespace zen -r repo.yaml --apply --latest-dependency
    ./cpd-cli install -a lite --arch ppc64le -c nfs-storage-provisioner -n zen -r repo.yaml --latest-dependency
    
  19. Install IBM Watson® Studio Local.

    ./cpd-cli adm --assembly wsl --arch ppc64le --namespace zen -r repo.yaml --apply --latest-dependency
    ./cpd-cli install -a wsl --arch ppc64le -c nfs-storage-provisioner -n zen -r repo.yaml --latest-dependency
    
  20. Install IBM Watson Machine Learning (WML).

    ./cpd-cli adm --assembly wml --arch ppc64le --namespace zen -r repo.yaml --apply --latest-dependency
    ./cpd-cli install -a wml --arch ppc64le -c nfs-storage-provisioner -n zen -r repo.yaml --latest-dependency
    
  21. Install Analytics Engine powered by Apache Spark (Spark).

    ./cpd-cli adm --assembly spark --arch ppc64le --namespace zen -r repo.yaml --apply --latest-dependency
    ./cpd-cli install -a spark --arch ppc64le -c nfs-storage-provisioner -n zen -r repo.yaml --latest-dependency
    
  22. Install RStudio.

    ./cpd-cli adm --assembly rstudio --arch ppc64le --namespace zen -r repo.yaml --apply --latest-dependency
    ./cpd-cli install -a rstudio --arch ppc64le -c nfs-storage-provisioner -n zen -r repo.yaml --latest-dependency
    
  23. Install R 3.6 runtime add-on.

    ./cpd-cli adm --assembly runtime-addon-r36 --arch ppc64le --namespace zen -r repo.yaml --apply --latest-dependency
    ./cpd-cli install -a runtime-addon-r36 --arch ppc64le -c nfs-storage-provisioner -n zen -r repo.yaml --latest-dependency
    
  24. Delete all completed pods after the installation of services has finished.

    oc get pods -n zen --no-headers=true | awk '/Completed/{print $1}' | xargs oc delete -n zen pod
    
  25. Verify installation.

    oc get clusterversion
    oc get co
    oc get nodes
    oc adm top nodes
    oc describe nodes
    oc get sc
    oc get pvc -A
    oc get pv -A
    oc get projects
    ./cpd-cli status -n zen
    oc adm top pods -n zen
    oc get pods -n zen
    
  26. Get the URL of the Cloud Pak for Data web console.

    oc get route -n zen | awk ' $3 ~ /^ibm-nginx/ {print "https://" $2}'
    

Running the AutoAI experiment

For a quick test, you can run an AutoAI experiment. Perform the following steps to conduct the test using the AutoAI experiment:

  1. Direct your browser to the Cloud Pak for Data web console.

    https://zen-cpd-zen.apps.<ClusterName>.<Domain>

    The default user ID and password are:
    User ID: admin
    Password: password

  2. Click Projects on the left pane.

    img1 View larger image

  3. Click New Project at the upper-right corner.

    img2 View larger image

  4. Click Create an empty project.

    img3 View larger image

  5. Enter a name for your project and click Create.

    img4 View larger image

  6. Click Add to project.

    img5 View larger image

  7. Click AutoAI experiment.

    img6 View larger image

  8. Enter a name for your new AutoAI experiment and click Create.

    img7 View larger image

  9. Open a new tab in your browser and download the Titanic example https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv

  10. Use the downloaded CSV file and drop the file on this screen.

    img8 View larger image

  11. Select the Survived column as the prediction column and click Run experiment.

    img9 View larger image

  12. Observe the results of your experiment. You have successfully ran your first AI job on Cloud Pak for Data.

    img10 View larger image

Summary

This tutorial helped you to install a comprehensive AI and machine learning environment using Cloud Pak for Data 3.5.2 on your Power Systems Virtual Server environment and to run a simple AutoAI experiment.