This tutorial is part of the Learning path: Deploying Red Hat OpenShift Container Platform 4.x on IBM Power Systems Virtual Servers.
Topics in “Exploring Red Hat OpenShift on Power Systems Virtual Server” | Type |
---|---|
Deploying Acme Air microservices application on Red Hat OpenShift Container Platform | Tutorial |
Deploying a sample MongoDB geospatial application on Red Hat OpenShift Container Platform | Tutorial |
Enable continuous deployment using Red Hat OpenShift S2I and GitHub webhooks | Tutorial |
Installing IBM Cloud Pak for Data 3.5.3 on Red Hat OpenShift Container Platform 4.6 on IBM Power Systems Virtual Server | Tutorial |
Introduction
IBM® Cloud Pak® for Data unifies and simplifies the collection, organization, and analysis of data. Enterprises can turn data into insights through an integrated cloud-native architecture. IBM Cloud Pak for Data is extensible and can be customized to a client’s unique data and AI landscapes through an integrated catalog of IBM, open source, and third-party microservices add-on.
This tutorial shows how to perform an online installation of Cloud Pak for Data and some of the services that are needed to use the Cloud Pak for Data industry accelerators available at https://www.ibm.com/support/producthub/icpdata/docs/content/SSQNUZ_current/cpd/svc/industry-accel-svc.html.
Prerequisites
This tutorial assumes that you are familiar with the Red Hat® OpenShift® Container Platform environment on IBM Power Systems™ Virtual Server. It is assumed that you have it already installed, you have access to it, and have the credentials for the kubeadmin (OpenShift cluster administrator). You must be familiar with Linux® command line and have at least basic understanding of Red Hat OpenShift.
Also, you must have created a local repository on a persistent storage and have a Network File System (NFS) storage class where the NFS export has the no_root_squash
and no_all_squash
properties set.
Then, you would need to make sure that the clocks on the worker nodes are synchronized.
You need to have the wget
and the oc
clients already installed and on your PATH
variable.
Estimated time
It is expected to take around 2 to 3 hours to complete the installation of IBM Cloud Pak for Data on IBM Power Systems Virtual Server. This lengthy duration is because we need to install the software from repositories on the internet.
Installing IBM Cloud Pak for Data
Perform the following steps to install IBM Cloud Pak for Data:
Log in as the root user on the bastion node.
Install the Linux screen utility that will maintain your session in case your internet connection drops and will make it recoverable when you reconnect (this is an optional step).
yum install -y https://dl.fedoraproject.org/pub/epel/8/Everything/ppc64le/Packages/s/screen-4.6.2-10.el8.ppc64le.rpm
Verify that the NFS export has the
no_root_squash
andno_all_squash
properties set. Restart the NFS server if changed.cat /etc/exports # /export *(rw,sync,no_root_squash,no_all_squash) systemctl restart nfs-server
Verify that the clocks on the OpenShift nodes are synchronized.
for NODE in $(oc get nodes|grep Ready|awk '{print $1}');do echo "$NODE ------------";ssh core@$NODE "date -u";done for NODE in $(oc get nodes|grep Ready|awk '{print $1}');do echo "$NODE ------------";ssh core@$NODE "chronyc sources"; done
Verify that the I/O performance of the NFS export meets the requirements. The value from the first
dd
command (disk latency) should be equal to or better than 2.5 MBps. The value of the seconddd
command (disk throughput) should be equal to or better than 209 MBps.BASTION_IP=$(nslookup $(hostname -s) | tail -n2 | head -1 | awk '{print $2}') NODE=$(oc get nodes | grep Ready | grep worker | head -1 | awk '{print $1}') cat <<EOF > /tmp/verify_disk.sh mkdir -p /mnt/export mount -t nfs ${BASTION_IP}:/export /mnt/export echo "Verifying disk latency of NFS share - should be equal or better than 2.5 MB/s" dd if=/dev/zero of=/mnt/export/testfile bs=4096 count=1000 oflag=dsync echo "Verifying disk throuhgput of NFS share - should be equal or better than 209 MB/s" dd if=/dev/zero of=/mnt/export/testfile bs=1G count=1 oflag=dsync rm /mnt/export/testfile; umount /mnt/export; rm -rf /mnt/export echo "Cf. https://www.ibm.com/support/knowledgecenter/SSQNUZ_3.5.0/cpd/plan/rhos-reqs.html#rhos-reqs__disk" echo "Done." EOF scp /tmp/verify_disk.sh core@${NODE}:/tmp ssh core@${NODE} "sudo sh /tmp/verify_disk.sh; rm /tmp/verify_disk.sh" rm /tmp/verify_disk.sh
Create a new user (in this example, cp4d) on the bastion host to use in the installation process.
useradd cp4d
Change to the new user.
su - cp4d
Log in to your Kubernetes cluster using the kubeadmin login and password.
oc login https://api.<ClusterName>.<Domain>:6443 # Authentication required for https:// api.<ClusterName>.<Domain>:6443 (openshift) # Username: kubeadmin # Password:
Expose the internal OpenShift image registry (if not done earlier).
oc patch configs.imageregistry.operator.openshift.io/cluster --type merge -p '{"spec":{"defaultRoute":true}}'
Use the kernel.yaml file to apply kernel tuning parameters.
oc apply -f kernel.yaml
Use the smt_crio_slub.yaml file to make sure that the required OpenShift Container Platform configuration is applied.
Download the smt_crio_slub yaml file
oc apply -f smt_crio_slub.yaml
Verify that smt_crio_slub.yaml changes have been applied. You need to wait until all worker nodes have been updated, that is until the status of the worker nodes shows: UPDATED=True, UPDATING=False, and DEGRADED=False. This could take up to 30 minutes as the worker nodes are being rebooted.
oc get mcp # NAME CONFIG UPDATED UPDATING DEGRADED MACHI…COUNT READY…COUNT UPDATED…COUNT DEGRADED…COUNT AGE # master rende… True False False 3 3 3 0 25d # worker rende… True False False 3 3 3 0 25d
Download the Cloud Pak for Data version installation utility from the public IBM GitHub repository.
wget https://github.com/IBM/cpd-cli/releases/download/v3.5.2/cpd-cli-ppc64le-EE-3.5.2.tgz
Extract the cpd-cli-ppc64le-EE-3.5.2.tgz package.
tar -xvf cpd-cli-ppc64le-EE-3.5.2.tgz
Using your preferred text editor, change the apikey entry in the repo.yaml file that was decompressed with the apikey you acquired with your IBM ID from the IBM container library.
--- fileservers: - url: "https://raw.github.com/IBM/cloud-pak/master/repo/cpd/3.5" registry: - url: cp.icr.io/cp/cpd name: base-registry namespace: "" username: cp apikey: [Get you entitlement key here https://myibm.ibm.com/products-services/containerlibrary]
Start the Linux screen utility (if you lose your connection to the internet you can resume the installation terminal using the
screen -r
command). This is an optional step.screen
Note: You can opt to download the install_all.sh file which contains the commands covered for the following steps.
Create a new project called zen.
oc new-project zen
Install Cloud Pak for Data Control Plane (Lite).
./cpd-cli adm --assembly lite --arch ppc64le --namespace zen -r repo.yaml --apply --latest-dependency ./cpd-cli install -a lite --arch ppc64le -c nfs-storage-provisioner -n zen -r repo.yaml --latest-dependency
Install IBM Watson® Studio Local.
./cpd-cli adm --assembly wsl --arch ppc64le --namespace zen -r repo.yaml --apply --latest-dependency ./cpd-cli install -a wsl --arch ppc64le -c nfs-storage-provisioner -n zen -r repo.yaml --latest-dependency
Install IBM Watson Machine Learning (WML).
./cpd-cli adm --assembly wml --arch ppc64le --namespace zen -r repo.yaml --apply --latest-dependency ./cpd-cli install -a wml --arch ppc64le -c nfs-storage-provisioner -n zen -r repo.yaml --latest-dependency
Install Analytics Engine powered by Apache Spark (Spark).
./cpd-cli adm --assembly spark --arch ppc64le --namespace zen -r repo.yaml --apply --latest-dependency ./cpd-cli install -a spark --arch ppc64le -c nfs-storage-provisioner -n zen -r repo.yaml --latest-dependency
Install RStudio.
./cpd-cli adm --assembly rstudio --arch ppc64le --namespace zen -r repo.yaml --apply --latest-dependency ./cpd-cli install -a rstudio --arch ppc64le -c nfs-storage-provisioner -n zen -r repo.yaml --latest-dependency
Install R 3.6 runtime add-on.
./cpd-cli adm --assembly runtime-addon-r36 --arch ppc64le --namespace zen -r repo.yaml --apply --latest-dependency ./cpd-cli install -a runtime-addon-r36 --arch ppc64le -c nfs-storage-provisioner -n zen -r repo.yaml --latest-dependency
Delete all completed pods after the installation of services has finished.
oc get pods -n zen --no-headers=true | awk '/Completed/{print $1}' | xargs oc delete -n zen pod
Verify installation.
oc get clusterversion oc get co oc get nodes oc adm top nodes oc describe nodes oc get sc oc get pvc -A oc get pv -A oc get projects ./cpd-cli status -n zen oc adm top pods -n zen oc get pods -n zen
Get the URL of the Cloud Pak for Data web console.
oc get route -n zen | awk ' $3 ~ /^ibm-nginx/ {print "https://" $2}'
Running the AutoAI experiment
For a quick test, you can run an AutoAI experiment. Perform the following steps to conduct the test using the AutoAI experiment:
Direct your browser to the Cloud Pak for Data web console.
https://zen-cpd-zen.apps.<ClusterName>.<Domain>
The default user ID and password are:
User ID: admin
Password: passwordClick Projects on the left pane.
Click New Project at the upper-right corner.
Click Create an empty project.
Enter a name for your project and click Create.
Click Add to project.
Click AutoAI experiment.
Enter a name for your new AutoAI experiment and click Create.
Open a new tab in your browser and download the Titanic example https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv
Use the downloaded CSV file and drop the file on this screen.
Select the Survived column as the prediction column and click Run experiment.
Observe the results of your experiment. You have successfully ran your first AI job on Cloud Pak for Data.
Summary
This tutorial helped you to install a comprehensive AI and machine learning environment using Cloud Pak for Data 3.5.2 on your Power Systems Virtual Server environment and to run a simple AutoAI experiment.