2021 Call for Code Awards: Live from New York, with SNL’s Colin Jost! Learn more

Installing IBM Cloud Pak for Data 3.5 on Red Hat OpenShift Container Platform 4.6 on IBM Power Systems Virtual Server


IBM® Cloud Pak® for Data unifies and simplifies the collection, organization, and analysis of data. Enterprises can turn data into insights through an integrated cloud-native architecture. IBM Cloud Pak for Data is extensible and can be customized to a client’s unique data and AI landscapes through an integrated catalog of IBM, open source, and third-party microservices add-on.

This tutorial shows how to perform an online installation of Cloud Pak for Data 3.5 on IBM Power Systems™ Virtual Server and some of the services that are needed to use the Cloud Pak for Data industry accelerators available at https://www.ibm.com/docs/en/cloud-paks/cp-data/4.0?topic=integrations-industry-accelerators.


This tutorial assumes that you are familiar with the Red Hat® OpenShift® Container Platform environment on IBM Power Systems Virtual Server. It is assumed that you have it already installed, you have access to it, and have the credentials for the kubeadmin (OpenShift cluster administrator). You must be familiar with Linux® command line and have at least basic understanding of Red Hat OpenShift.

Also, you must have created a local repository on a persistent storage and have a Network File System (NFS) storage class where the NFS export has the no_root_squash property set.

Then, you would need to make sure that the clocks on the worker nodes are synchronized.

You need to have the wget and the oc clients already installed and on your PATH variable.

Estimated time

It is expected to take around 2 to 3 hours to complete the installation of IBM Cloud Pak for Data on IBM Power Systems Virtual Server. This lengthy duration is because we need to install the software from repositories on the internet.

Installing IBM Cloud Pak for Data

Perform the following steps to install IBM Cloud Pak for Data:

  1. Log in as the root user on the bastion node.

  2. Install the Linux screen utility that will maintain your session in case your internet connection drops and will make it recoverable when you reconnect (this is an optional step).

    yum install -y https://dl.fedoraproject.org/pub/epel/8/Everything/ppc64le/Packages/s/screen-4.6.2-12.el8.ppc64le.rpm
  3. Verify that the NFS export has the no_root_squash property set. Restart the NFS server if changed.

     cat /etc/exports
     # /export *(rw,sync,no_root_squash)
     systemctl restart nfs-server
  4. Verify that the clocks on the OpenShift nodes are synchronized.

    for NODE in $(oc get nodes|grep Ready|awk '{print $1}');do echo "$NODE ------------";ssh core@$NODE "date -u";done
    for NODE in $(oc get nodes|grep Ready|awk '{print $1}');do echo "$NODE ------------";ssh core@$NODE "chronyc sources"; done
  5. Verify that the I/O performance of the NFS export meets the requirements. The value from the first dd command (disk latency) should be equal to or better than 2.5 MBps. The value of the second dd command (disk throughput) should be equal to or better than 209 MBps.

    BASTION_IP=$(nslookup $(hostname -s) | tail -n2 | head -1 | awk '{print $2}')
    NODE=$(oc get nodes | grep Ready | grep worker | head -1 | awk '{print $1}')
    cat <<EOF > /tmp/verify_disk.sh
    mkdir -p /mnt/export
    mount -t nfs ${BASTION_IP}:/export /mnt/export
    echo "Verifying disk latency of NFS share - should be equal or better than 2.5 MB/s"
    dd if=/dev/zero of=/mnt/export/testfile bs=4096 count=1000 oflag=dsync
    echo "Verifying disk throuhgput of NFS share - should be equal or better than 209 MB/s"
    dd if=/dev/zero of=/mnt/export/testfile bs=1G count=1 oflag=dsync
    rm /mnt/export/testfile; umount /mnt/export; rm -rf /mnt/export
    echo "Cf. https://www.ibm.com/support/knowledgecenter/SSQNUZ_3.5.0/cpd/plan/rhos-reqs.html#rhos-reqs__disk"
    echo "Done."
    scp /tmp/verify_disk.sh core@${NODE}:/tmp
    ssh core@${NODE} "sudo sh /tmp/verify_disk.sh; rm /tmp/verify_disk.sh"
    rm /tmp/verify_disk.sh
  6. Create a new user (in this example, cp4d) on the bastion host to use in the installation process.

    useradd cp4d
  7. Change to the new user.

    su - cp4d
  8. Log in to your Kubernetes cluster using the kubeadmin login and password.

    oc login https://api.<ClusterName>.<Domain>:6443
    # Authentication required for https:// api.<ClusterName>.<Domain>:6443 (openshift)
    # Username: kubeadmin
    # Password:
  9. Expose the internal OpenShift image registry (if not done earlier).

    oc patch configs.imageregistry.operator.openshift.io/cluster --type merge -p '{"spec":{"defaultRoute":true}}'
  10. Use the kernel.yaml file to apply kernel tuning parameters. Note that these settings are for the worker nodes with 64 GB RAM. Refer to the following documentation to understand how to adapt: https://www.ibm.com/docs/en/cloud-paks/cp-data/3.5.0?topic=tasks-changing-required-node-settings#node-settings__kernel

    Download the kernel.yaml file

    oc apply -f kernel.yaml
  11. Use the smt_crio_slub.yaml file to make sure that the required OpenShift Container Platform configuration is applied.

    Download the smt_crio_slub yaml file

    oc apply -f smt_crio_slub.yaml
  12. Verify that smt_crio_slub.yaml changes have been applied. You need to wait until all worker nodes have been updated, that is until the status of the worker nodes shows: UPDATED=True, UPDATING=False, and DEGRADED=False. This could take up to 30 minutes as the worker nodes are being rebooted.

    oc get mcp
    # master rende…   True    False    False    3           3           3               0                25d
    # worker rende…   True    False    False    3           3           3               0                25d
  13. Download the Cloud Pak for Data installation utility from the public IBM GitHub repository.

    wget https://github.com/IBM/cpd-cli/releases/download/v3.5.6/cpd-cli-ppc64le-EE-3.5.6.tgz
  14. Extract the cpd-cli-ppc64le-EE-3.5.6.tgz package.

    tar -xvf cpd-cli-ppc64le-EE-3.5.6.tgz
  15. Using your preferred text editor, change the apikey entry in the repo.yaml file that was decompressed with the apikey you acquired with your IBM ID from the IBM container library.

     url: "https://raw.github.com/IBM/cloud-pak/master/repo/cpd/3.5"
     url: cp.icr.io/cp/cpd
     name: base-registry
     namespace: ""
     username: cp
     apikey: [Get you entitlement key here https://myibm.ibm.com/products-services/containerlibrary]
  16. Start the Linux screen utility (if you lose your connection to the internet you can resume the installation terminal using the screen -r command). This is an optional step.


    Note: You can opt to download the install_all.sh file which contains the commands covered for the following steps.

  17. Create a new project called zen.

    oc new-project zen
  18. Install Cloud Pak for Data Control Plane (lite).

    ./cpd-cli adm --assembly lite --arch ppc64le --namespace zen -r repo.yaml --apply --latest-dependency
    ./cpd-cli install -a lite --arch ppc64le -c nfs-storage-provisioner -n zen -r repo.yaml --latest-dependency
  19. Install IBM Watson® Studio Local (wsl).

    ./cpd-cli adm --assembly wsl --arch ppc64le --namespace zen -r repo.yaml --apply --latest-dependency
    ./cpd-cli install -a wsl --arch ppc64le -c nfs-storage-provisioner -n zen -r repo.yaml --latest-dependency
  20. Install IBM Watson Machine Learning (wml).

    ./cpd-cli adm --assembly wml --arch ppc64le --namespace zen -r repo.yaml --apply --latest-dependency
    ./cpd-cli install -a wml --arch ppc64le -c nfs-storage-provisioner -n zen -r repo.yaml --latest-dependency
  21. Install Analytics Engine powered by Apache Spark (spark).

    ./cpd-cli adm --assembly spark --arch ppc64le --namespace zen -r repo.yaml --apply --latest-dependency
    ./cpd-cli install -a spark --arch ppc64le -c nfs-storage-provisioner -n zen -r repo.yaml --latest-dependency
  22. Install RStudio (rstudio).

    ./cpd-cli adm --assembly rstudio --arch ppc64le --namespace zen -r repo.yaml --apply --latest-dependency
    ./cpd-cli install -a rstudio --arch ppc64le -c nfs-storage-provisioner -n zen -r repo.yaml --latest-dependency
  23. Install R 3.6 runtime add-on (runtime-addon-r36).

    ./cpd-cli adm --assembly runtime-addon-r36 --arch ppc64le --namespace zen -r repo.yaml --apply --latest-dependency
    ./cpd-cli install -a runtime-addon-r36 --arch ppc64le -c nfs-storage-provisioner -n zen -r repo.yaml --latest-dependency
  24. Delete all completed pods after the installation of services has finished.

    oc get pods -n zen --no-headers=true | awk '/Completed/{print $1}' | xargs oc delete -n zen pod
  25. Verify installation.

    oc get clusterversion
    oc get co
    oc get deployments
    oc get replicasets
    oc get sts
    oc get jobs
    oc describe nodes
    oc adm top nodes
    oc adm top pods
    ./cpd-cli status -n zen
    oc get nodes -o wide
    oc get pods -n zen -o wide
  26. Get the URL of the Cloud Pak for Data web console.

    oc get route -n zen | awk ' $3 ~ /^ibm-nginx/ {print "https://" $2}'

Running the AutoAI experiment

For a quick test, you can run an AutoAI experiment. Perform the following steps to conduct the test using the AutoAI experiment:

  1. Direct your browser to the Cloud Pak for Data web console.


    The default user ID and password are:
    User ID: admin
    Password: password

  2. Click Projects on the left pane.

    img1 View larger image

  3. Click New Project at the upper-right corner.

    img2 View larger image

  4. Click Create an empty project.

    img3 View larger image

  5. Enter a name for your project and click Create.

    img4 View larger image

  6. Click Add to project.

    img5 View larger image

  7. Click AutoAI experiment.

    img6 View larger image

  8. Enter a name for your new AutoAI experiment and click Create.

    img7 View larger image

  9. Open a new tab in your browser and download the Titanic example https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv

  10. Use the downloaded CSV file and drop the file on this screen.

    img8 View larger image

  11. Select the Survived column as the prediction column and click Run experiment.

    img9 View larger image

  12. Observe the results of your experiment. You have successfully run your first AI job on Cloud Pak for Data.

    img10 View larger image


This tutorial helped you to install a comprehensive AI and machine learning environment using Cloud Pak for Data 3.5 on your Power Systems Virtual Server environment and to run a simple AutoAI experiment.