Follow these steps to install IBM Cloud Private on a POWER8 or POWER9 system

Installing ICP

  1. Designate one of three machines as the master and log in to the master node
  2. Download the GA version of ICP-ee image tar package from here and copy it to all nodes
  3. Set up password-less ssh to all nodes and ensure that all nodes have the same time zone:
    • Generate keys by using this command:
      ssh-keygen -t rsa -f /root/.ssh/id_rsa -P ''
    • Run the command:
      ssh-copy-id -i ssh_file_path root@hostname
      In this command, replace the ssh_file_path with the id you want to use to ssh to all other nodes. Also replace hostname with the hostname or IP of each of the systems. Do this for every node in the cluster.
  4. Ensure that all default ports are open, but are not in use. For more information, see the ICP documentation.
  5. Remove /var/lib/mysql directory (if your server has one).
  6. Set the vm.max_map_count on all nodes: echo "vm.max_map_count=262144" | tee -a /etc/sysctl.conf
  7. Install Docker. If the systems have small root directory (less then 200Gb), but another larger directory to store data, you can create a symbolic link using this command: ln -s /data1/docker /var/lib/docker

    For Ubuntu:

    apt-get update
    apt install curl
    curl -sSL https://get.docker.com/ | bash

    For Red Hat Enterprise Linux (RHEL):

    yum update
    yum install docker

    Alternatively, for RHEL, you can use docker rpm files downloaded from here. For example:

    wget http://ftp.unicamp.br/pub/ppc64el/rhel/7/docker-ppc64el/container-selinux-2.9-4.el7.noarch.rpm http://ftp.unicamp.br/pub/ppc64el/rhel/7/docker-ppc64el/docker-ce-17.09.0.ce-1.el7.centos.ppc64le.rpm
    rpm -ivh container-selinux-2.9-4.el7.noarch.rpm
    rpm -ivh docker-ce-17.09.0.ce-1.el7.centos.ppc64le.rpm

    Alternatively, Docker can be installed with ICP provided installer.

    wget http://pokgsa.ibm.com/projects/i/icp-2.1.0.3/icp-docker-17.12.1_ppc64le.bin
    chmod +x icp-docker-17.12.1_ppc64le.bin
    ./icp-docker-17.12.1_ppc64le.bin --install

    As a Docker storage engine, we used overlay and did not have problems with space errors. However if the storage engine is devicemapper, we recommend to put specific dm.basesize option value in /etc/docker/daemon.json file:

    {“storage-opts”: [“dm.basesize=20G”]}

  8. Install Python:
    apt-get update && apt-get install python
    apt-get update && apt-get install python-pip
    or
    yum install python python-pip -y
  9. Stop firewalld
    systemctl stop 'firewalld'
  10. Extract the image bundle and load to docker on every machine in the cluster
    tar -xf ibm-cloud-private-ppc64le-2.1.0.3.tar.gz -O | docker load
  11. Create working directory for ICP
    mkdir /opt/icp2.1.0.3
    cd /opt/icp2.1.0.3
  12. Extract config files
    docker run -v $(pwd):/data -e LICENSE=accept ibmcom/icp-inception:2.1.0.3-ee cp -r cluster /data
    cd cluster
  13. Configured the hosts file in the cluster directory with ip addresses of the nodes. We used one node as master, 3 nodes as workers and one node as proxy. The same node can be master, worker and proxy at the same time. Because of small root directory, we also updated config.yaml with extra arg for kubelet to set another root dir. Skip it if you have large root directory

    kubelet_extra_args: ["--fail-swap-on=false","--root-dir=/data1/kubelet"]
  14. Configure the /opt/icp-/cluster/hosts file with ip addressed of master, workers, proxy and management (if needed). If possible, don’t use one system as master, proxy, and management. If proxy node is specified, ICP 2.1.0.3 asks about proxy_vip (additional ip address different from the others):

    cat hosts
    [master]
    master_node_1_IP_address
    [worker]
    worker_node_1_IP_address
    worker_node_2_IP_address
    worker_node_3_IP_address
    [proxy]
    proxy_node_3_IP_address
    [management]
    management_node_2_IP_address

  15. Select the correct ssh_key; use the key that you ssh-copy-id to all the nodes:
    cp /root/.ssh/id_rsa ssh_key
  16. Install socat on master node
    wget http://www.dest-unreach.org/socat/download/socat-1.7.3.2.tar.gz
    tar zxvf socat-1.7.3.2.tar.gz
    cd socat-1.7.3.2/
    yum install gcc* (if C compiler does not already exists)
    ./configure
    make
    make install
  17. To enable GPUs in the cluster, on the POWER8 Systems (S822LC-hpc Minsky), follow these steps:
    • Clean up CUDA libraries from any prior installations:
      yum list installed | grep -i cuda
      yum remove cuda-*
      yum remove dkms.noarch
      yum remove epel-release
      yum remove nvidia-kmod*
    • Get CUDA 9.2 libraries and install them on the system:
      cd /tmp
      wget http://developer.download.nvidia.com/compute/cuda/repos/rhel7/ppc64le/cuda-repo-rhel7-9.2.88-1.ppc64le.rpm
      rpm -i cuda-repo-rhel7-9.2.88-1.ppc64le.rpm
      yum clean all
      rpm -ivh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
      yum install dkms
      yum install cuda
      #verify GPU can be seen
      nvidia-smi

      #verify if device file has been created
      ls /dev/nvidia-uvm

      #If device file not found download utility (https://www.ibm.com/support/knowledgecenter/en/SSBS6K_2.1.0.2/manage_cluster/verify_gpu.html)and execute it:
      ./cudaInit_ppc64le

      #verify file is created
      ls /dev/nvidia-uvm

      #verify if device log file exists:
      ls /var/lib/docker/volumes/

    On a POWER9 AC922 Newells, follow instructions in the “System Setup”, set up the IBM POWER9 specific udev rules section of this article https://developer.ibm.com/linuxonpower/deep-learning-powerai/releases/

  18. Run the command to install ICP
    nohup docker run --net=host -t -e LICENSE=accept -v $(pwd):/installer/cluster ibmcom/icp-inception:2.1.0.3-ee install &
    If the installation fails, a similar un-installation can be used:
    nohup docker run --net=host -t -e LICENSE=accept -v $(pwd):/installer/cluster ibmcom/icp-inception:2.1.0.3-ee uninstall &

DSXL Install

DSXL can be downloaded from Passport Advantage.

  1. In order to install the helm chart, we had to use a special tool (ibmcloud), which can be found here:
    https://console.bluemix.net/docs/cli/reference/bluemix_cli/download_cli.html#install_use
    wget https://clis.ng.bluemix.net/download/bluemix-cli/latest/ppc64le
    tar -zxvf
    cd Bluemix_CLI/
    ./install_bluemix_cli
  2. In IBM Cloud Private’s administration console (https://<master node>:8443/console, default user id: admin, default password: admin), go to Menu > Command Line Tools > Cloud Private CLI and downloaded the plugin file. Install the ICP plugin using this command:
    ibmcloud plugin install /<path_to_installer>/<cli_file_name>
  3. Copy /etc/docker/certs.d/mycluster.icp:8500/ca.crt from ICP boot node to /etc/docker/certs.d/mycluster.icp:8500/ca.crt the machine you are perform the installation (if it is differs from the system you are installing on). If you use Mac OS, please update ~/.docker/certs.d/mycluster.icp\:8500/ca.crt and restart docker. For more information, please see the Docker documentation. You may have to create directory mycluster.icp:8500 if that does not already exists on x86 system prior to copying the certs.
  4. Update /etc/hosts with
    <cluster_access_ip> mycluster.icp
  5. Install kubectl on boot node:
    docker run -e LICENSE=accept --net=host -v /usr/local/bin:/data ibmcom/icp-inception:2.1.0.3-ee cp /usr/local/bin/kubectl /data
  6. In the IBM Cloud Private App Center (click on the icon on the top right hand corner), select the user and click Configure Client to configure kubectl.
  7. Run docker login with admin/admin

    docker login mycluster.icp:8500

  8. Log in to the cluster

    ibmcloud pr login -a https://<cluster_access_ip>:8443 --skip-ssl-validation

  9. Push the downloaded DSX archive to the ICP

    nohup ibmcloud pr load-ppa-archive --archive dsxlocal-icp-plinux-le.tar.gz &

    It may take about 2 to 4 hours to load all the images.

  10. Now the Helm chart is created. Do the following steps in the ICP administration console (browser link: https://<cluster_access_ip>:8443/console user id: admin, default password: admin).
  11. Go to Manage > Helm Repositories and click Sync Repositories.
  12. Go to Catalog and verify that the ibm-dsx-prod chart now displays. Ensure that you have the latest version of the chart.
  13. To change the scope of images from namespace to global run the following command on master node:
    for image in $(kubectl get images | tail -n +2 | awk '{ print $1; }'); do kubectl get image $image -o yaml | sed 's/scope: namespace/scope: global/' | kubectl apply -f -; done
    This step could also be done from ICP UI.
  14. Go to Manage > Namespaces and create dsx-prod namespace.
  15. If you use dynamic provisioning with GlusterFS, simply ensure that the appropriate storage class exists. This can be checked with:
    kubectl get storageclasses | grep glusterfs
    If nothing is displayed, then consult your cluster administrator about the availability of GlusterFS.
  16. For this example, we used NFS storage. As NFS server, we chose ICP master node. We installed and ran NFS packages on all ICP nodes
    For RHEL:
    yum install -y nfs-utils nfs-utils-lib
    systemctl start nfs
    For Ubuntu:
    apt-get update
    apt-get install nfs-kernel-server
  17. Created shared NFS directory on the ICP master node
    mkdir /nfsshare
  18. For NFS, we created four directories in the NFS mount path:
    cloudant
    redis
    spark-metrics
    user-home
  19. Made an entry in “/etc/exports” on master node and restarted the services

    /nfsshare *(rw,sync,no_subtree_check,no_root_squash)

    systemctl restart nfs

    Ensure mount path is correct (verify 'showmount -e <master_ip_address> ' against /etc/exports)

  20. In this example, we used dsx-volumes.yml file and update server and path values for each volume, for instance
    nfs:
    server: <master_ip_address>
    path: /nfsshare/user-home
  21. Created all volumes with this yaml file: dsx-volumes.yml
    kubectl --validate=false create -f dsx-volumes.yml
  22. From the Catalog, we selected ibm-dsx-prod, and click Configure. There are a lot of parameters that can be changed for installation there. I used the default values.
  23. the

  24. Change the namespace to what you created in step 14 and ensure that the path for all image repositories is pointing to mycluster.icp:8500/default/ and click on configure to DSXL.
  25. Check:
    kubectl get pods --namespace=dsx-prod

    When all the pods were running, DSX on ICP was up and running.

Start using DSX Local

In a web browser, go to the URL https://<master_node_ip>:31843/ (use your own MASTER_NODE_IP address) to access the DSX Local client and created some test notebooks. Default credentials were admin/password

See IBM Data Science Experience Local for more documentation on how to use DSX Local.

Cluster cleanup

Steps to uninstall DSX Local

  1. From ICP UI – go to Workload/Helm releases and delete current release there
  2. From ICP UI – go to Platform/Storage and delete volumes with NFS
  3. From ICP UI – Manage/Name spaces- remove name spaces
  4. On ICP master node, run rm -rf <mount dir>
  5. Delete containers
    docker rm <container>
  6. Delete images
    docker rmi <image>

Steps to uninstall ICP

Follow the ICP documentation if you need to uninstall your ICP cluster. The following steps summarize the uninstall process.

  1. From your boot node, change to the cluster directory within your IBM Cloud Private installation directory:
    cd /<installation_directory>/cluster
  2. Run the command:
    nohup docker run --net=host -t -e LICENSE=accept -v $(pwd):/installer/cluster ibmcom/icp-inception:2.1.0.3-ee uninstall &
  3. Reboot all nodes.

Acknowledgements

The author would like to thank Yulia Gaponenko and Indrajit Poddar for outlining the detailed steps in this blog.

Join The Discussion

Your email address will not be published. Required fields are marked *