Installing IBM Cloud Pak for Data on Red Hat OpenShift Container Platform on IBM Power Systems Virtual Server

This tutorial is part of the Learning path: Deploying Red Hat OpenShift Container Platform 4.x on IBM Power Systems Virtual Servers.

Introduction

IBM® Cloud Pak® for Data unifies and simplifies the collection, organization, and analysis of data. Enterprises can turn data into insights through an integrated cloud-native architecture. IBM Cloud Pak for Data is extensible and can be customized to a client’s unique data and AI landscapes through an integrated catalog of IBM, open source, and third-party microservices add-on.

This tutorial shows how to perform an online installation of Cloud Pak for Data and some of the services that are needed to use the Cloud Pak for Data industry accelerators available at https://www.ibm.com/support/producthub/icpdata/docs/content/SSQNUZ_current/cpd/svc/industry-accel-svc.html.

Prerequisites

This tutorial assumes that you are familiar with the Red Hat® OpenShift® Container Platform environment on IBM Power Systems™ Virtual Server. It is assumed that you have it already installed, you have access to it, and have the credentials for the kubeadmin (OpenShift cluster administrator). You must be familiar with Linux® command line and have at least basic understanding of Red Hat OpenShift.

Also, you must have created a local repository on a persistent storage and have a Network File System (NFS) storage class where the NFS export has the no_root_squash property set.

You need to have the wget and the oc clients already installed and on your PATH variable.

Estimated time

It is expected to take up to 2 hours to complete the installation of IBM Cloud Pak for Data on IBM Power Systems Virtual Server. This lengthy duration is because we need to install the software from repositories on the internet.

Installing IBM Cloud Pak for Data

Perform the following steps to install IBM Cloud Pak for Data:

  1. Log in as the root user on the bastion node.

  2. Install the Linux screen utility that will maintain your session in case your internet connection drops and will make it recoverable when you reconnect (this is an optional step).

    yum install -y https://dl.fedoraproject.org/pub/epel/8/Everything/ppc64le/Packages/s/screen-4.6.2-10.el8.ppc64le.rpm
    
  3. Create a new user (in this example, cp4d) to use in the installation process.

    useradd cp4d
    
  4. Change to the new user.

    su - cp4d
    
  5. Log in to your Kubernetes cluster using the kubeadmin login and password.

    oc login https://api.<ClusterName>.<Domain>:6443
    Authentication required for https:// api.<ClusterName>.<Domain>:6443 (openshift)
    Username: kubeadmin
    Password:
    
  6. Expose the internal OpenShift image registry (if not done earlier).

    oc patch configs.imageregistry.operator.openshift.io/cluster --type merge -p '{"spec":{"defaultRoute":true}}'
    
  7. Use the kernel.yaml file to apply kernel tuning parameters.

    Download the kernel.yaml file

    oc apply -f kernel.yaml
    
  8. Use the smt_crio_slub.yaml file to make sure that the required OpenShift Container Platform configuration is applied.

    Download the smt_crio_slub yaml file

    oc apply -f smt_crio_slub.yaml
    
  9. Download the Cloud Pak for Data installation utility from the public IBM GitHub repository.

    wget https://github.com/IBM/cpd-cli/releases/download/cpd-3.0.1/cloudpak4data-ee-3.0.1.tgz
    
  10. Extract the cloudpak4data-ee-3.0.1.tgz package.

    tar -xvf cloudpak4data-ee-3.0.1.tgz
    

    In this example, we use the ./bin/cpd-ppc64le and ./repo.yaml files.

  11. Move the cpd-ppc64le binary file to the home directory.

    mv ./bin/cpd-ppc64le .
    
  12. Using your preferred text editor, change the apikey entry in the repo.yaml file that was decompressed with the apikey you acquired with your IBM ID from the IBM container library.

    registry:
    - url: cp.icr.io/cp/cpd
      username: cp
      apikey: [Get you entitlement key here https://myibm.ibm.com/products-services/containerlibrary]
      name: base-registry
    fileservers:
    - url: https://raw.github.com/IBM/cloud-pak/master/repo/cpd3
    
  13. Start the Linux screen utility (if you lose your connection to the internet you can resume the installation terminal using the screen -r command). This is an optional step.

    screen
    

    Note: You can opt to download the install_all.sh file which contains the commands covered from step 14 to step 29.

  14. Create a new project called zen.

    oc new-project zen
    
  15. Install Cloud Pak for Data Control Plane (Lite).

    ./cpd-ppc64le adm --assembly lite --arch ppc64le --version 3.0.1 --namespace zen -r ./repo.yaml --apply
    ./cpd-ppc64le -a lite --arch ppc64le -c nfs-storage-provisioner -n zen -r ./repo.yaml
    
  16. Install IBM Watson® Studio Local.

    ./cpd-ppc64le adm --assembly wsl --arch ppc64le --version 3.0.1 --namespace zen -r ./repo.yaml --apply
    ./cpd-ppc64le -a wsl --arch ppc64le -c nfs-storage-provisioner -n zen -r ./repo.yaml
    
  17. Install IBM Watson Machine Learning (WML).

    ./cpd-ppc64le adm --assembly wml --arch ppc64le --version 3.0.1 --namespace zen -r ./repo.yaml --apply
    ./cpd-ppc64le -a wml --arch ppc64le -c nfs-storage-provisioner -n zen -r ./repo.yaml
    
  18. Install Analytics Engine powered by Apache Spark (Spark).

    ./cpd-ppc64le adm --assembly spark --arch ppc64le --version 3.0.30 --namespace zen -r ./repo.yaml --apply
    ./cpd-ppc64le -a spark --arch ppc64le -c nfs-storage-provisioner -n zen -r ./repo.yaml
    
  19. Install RStudio.

    ./cpd-ppc64le adm --assembly rstudio --arch ppc64le --version 3.0.1 --namespace zen -r ./repo.yaml --apply
    ./cpd-ppc64le -a rstudio --arch ppc64le -c nfs-storage-provisioner -n zen -r ./repo.yaml
    
  20. Install R 3.6 runtime add-on.

    ./cpd-ppc64le adm --assembly runtime-addon-r36 --arch ppc64le --version 3.0.1 --namespace zen -r ./repo.yaml --apply
    ./cpd-ppc64le -a runtime-addon-r36 --arch ppc64le -c nfs-storage-provisioner -n zen -r ./repo.yaml
    
  21. Get the internal repository default route and save it in the INTREPO variable.

    INTREPO=$(oc get route -n openshift-image-registry |awk ' $3 ~ /^image-registry/ {print $2}')
    
  22. Install common core services patch 5.

    ./cpd-ppc64le patch --namespace zen --assembly wsl --patch-name cpd-3.0.1-ccs-patch-5 --transfer-image-to ${INTREPO}/zen --arch ppc64le -r ./repo.yaml --target-registry-username=kubeadmin --target-registry-password=$(oc whoami -t) --insecure-skip-tls-verify
    
  23. Install WSL patch 4.

    ./cpd-ppc64le patch --namespace zen --assembly wsl --patch-name cpd-3.0.1-wsl-patch-4 --transfer-image-to ${INTREPO}/zen --arch ppc64le -r ./repo.yaml --target-registry-username=kubeadmin --target-registry-password=$(oc whoami -t) --insecure-skip-tls-verify
    
  24. Install WML patch 3.

    ./cpd-ppc64le patch --namespace zen --assembly wml --patch-name cpd-3.0.1-wml-patch-3 --transfer-image-to ${INTREPO}/zen --arch ppc64le -r ./repo.yaml --target-registry-username=kubeadmin --target-registry-password=$(oc whoami -t) --insecure-skip-tls-verify
    
  25. Install Spark patch 3.

    ./cpd-ppc64le patch --namespace zen --assembly spark --patch-name cpd-3.0.1-spark-patch-3 --transfer-image-to ${INTREPO}/zen --arch ppc64le -r ./repo.yaml --target-registry-username=kubeadmin --target-registry-password=$(oc whoami -t) --insecure-skip-tls-verify
    
  26. Install RStudio patch 4.

    ./cpd-ppc64le patch --namespace zen --assembly rstudio --patch-name cpd-3.0.1-rstudio-patch-4 --transfer-image-to ${INTREPO}/zen --arch ppc64le -r ./repo.yaml --target-registry-username=kubeadmin --target-registry-password=$(oc whoami -t) --insecure-skip-tls-verify
    
  27. Install R 3.6 add-on patch 3.

    ./cpd-ppc64le patch --namespace zen --assembly runtime-addon-r36 --patch-name cpd-3.0.1-runtime-addon-r36-patch-3 --transfer-image-to ${INTREPO}/zen --arch ppc64le -r ./repo.yaml --target-registry-username=kubeadmin --target-registry-password=$(oc whoami -t) --insecure-skip-tls-verify
    
  28. Installing the patches will restart some pods. Wait until all the pods are in the running state again, that is, until the following command returns an empty list.

    oc get pods | grep "0/" | grep -v Completed
    
  29. Get the URL of the Cloud Pak for Data web console.

    oc get route -n zen | awk ' $3 ~ /^ibm-nginx/ {print "https://" $2}'
    

Running the AutoAI experiment

For a quick test, you can run an AutoAI experiment. Perform the following steps to conduct the test using the AutoAI experiment:

  1. Direct your browser to the Cloud Pak for Data web console.

    https://zen-cpd-zen.apps.<ClusterName>.<Domain>

    The default user ID and password are:
    User ID: admin
    Password: password

  2. Click Projects on the left pane.

    img1 View larger image

  3. Click New Project at the upper-right corner.

    img2 View larger image

  4. Click Create an empty project.

    img3 View larger image

  5. Enter a name for your project and click Create.

    img4 View larger image

  6. Click Add to project.

    img5 View larger image

  7. Click AutoAI experiment.

    img6 View larger image

  8. Enter a name for your new AutoAI experiment and click Create.

    img7 View larger image

  9. Open a new tab in your browser and download the Titanic example https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv

  10. Use the downloaded CSV file and drop the file on this screen.

    img8 View larger image

  11. Select the Survived column as the prediction column and click Run experiment.

    img9 View larger image

  12. Observe the results of your experiment. You have successfully ran your first AI job on Cloud Pak for Data.

    img10 View larger image

Summary

This tutorial helped you to install a comprehensive AI and machine learning environment using Cloud Pak for Data on your Power Systems Virtual Server environment and to run a simple AutoAI experiment.