This blog provides steps and instructions that are required to bring up Data Science Experience Local on IBM Power Systems.

DSX (Data Science Experience) is an interactive environment, which is useful for data scientists to collaborate on machine learning projects and solve the toughest data challenges with the best tools available in DSX like Rstudio, Jupyter, and Watson Machine Learning(WML) Spark in an integrated environment. Like DSX on IBM Cloud, at IBM, Data Science Experience is enabled on IBM Power Platform.

DSX provides an environment that works with Power AI and exposes the open source software through interfaces like jupyter notebook. This gives an easier way for data scientists to work with the technologies provided by the Power AI tools. With DSX on Power, data analysis is very quick. In addition to all the capabilities that consumers can experience, other advantages of Power technology are advanced GPU support and NVLink. Data Science Experience can be installed and run on a local setup of Power systems or on IBM Private Cloud and can make use of the tools that are available in the installed DSX cluster.

Installing DSX Local on Virtual or physical machines

The installation of DSX Local is simple and straight forward. Complete DSX Local can be configured through one installer. DSX installation takes 3 to 3.5 hours. Some of the installation steps load docker images (some are of size in GB) like notebook, spark, cloudant database, and rstudio and pulls different services from docker. The installation time mainly depends on the network speed and available resources. So before installing DSX, ensure that the system requirements are met and the disk performance is good. Otherwise, even if the installation succeeds, the cluster will experience issues due to slowness of disk.

System requirements:

The link System requirements for Data Science describes in detail about the hardware and software requirements for three-node and nine-node configuration. The configuration details in the above link are required for production environment. For testing purpose, we can go with lower configuration, which is also described in this blog in the later sections.

The following is the configuration that was tried out for a three-node configuration.

  • Number of servers used: Three virtual machines (Servers can be virtual or physical machines)
  • Installed Operating System: RHEL 7.2 ppc64le
Node CPU RAM Network Additional disks
Master1 8 24 Public-ip + private-ip configured Two disks of size 500 and 400 GB
Master2 8 24 Only private-ip Two disks of size
400GB
Master3 8 24 Only private-ip Two disks of size 400GB

Note: After installing by using a private IP, it can be changed later to public for external connect.

Preconfiguration of system before installing DSX.

Create partitions and volume groups of required size

  1. Create and format the two disk partitions in all the three nodes with XFS.
  2. Create new partitions using parted if free partitions are not available.

Example: (Considering the disks are named as vda and vdb)

# parted /dev/vda --script mklabel gpt
# parted /dev/vda --script mkpart primary '0%' '100%'
# mkfs.xfs -f -n ftype=1 /dev/vda1
# parted /dev/vdb --script mklabel gpt
# parted /dev/vdb --script mkpart primary '0%' '100%'
# mkfs.xfs -f -n ftype=1 /dev/vdb1

Create two directories and mount the partitions on all three nodes

# mkdir -p /ibm
# mkdir -p /data

where /ibm – installer-partition, /data – data partition

To ensure mount persists after reboot, add the similar entry in /etc/fstab on all three nodes

# echo "/dev/vda1    /ibm    xfs defaults,noatime 1 2" >> /etc/fstab
# echo "/dev/vdb1    /data   xfs defaults,noatime 1 2" >> /etc/fstab

Mount the partitions on all three nodes

# mount /ibm
# mount /data

Verify whether the partitions are mounted properly on all the three nodes:

# df -h /ibm
Filesystem   Size Used Avail Use% Mounted on
/dev/vda1    400G 33M 400G 1% /ibm

# df -h /data
Filesystem   Size Used Avail Use% Mounted on
/dev/vdb1    500G 33M 500G 1% /data

Selinux settings

Selinux must be in enforcing or permissive. To achieve this perform one of the following:

  • # setenfore 0

    and verify #getenforce shows the desired results.

  • Otherwise, modify /etc/sysconfig/selinux and make “SELINUX=permissive”
    This requires system reboot. Reboot and ensure that the result shows permissive.
# getenforce
Permissive

Also, after reboot ensure that the partitions are mounted as described in the above section.

autologin from master-1 to master-2 and master-3

This is required since installer connects to other two nodes to copy docker images, start different services, and so on.

  1. Create ssh-key in master-1 using:
    # ssh-keygen
  2. Add the rsa keys ( .ssh/id_rsa.pub ) in .ssh/authorized_keys file in master-2 and master-3.
  3. Ensure login from master-1 to other two nodes happens without password.

Get Proxy IP

The installation requires Proxy IP (in this case an unused private ip) as HA proxy IP address. Install one more node with private-ip assigned and later shut down this proxy node.

#shutdown -h now

-> Run this in proxy node to shut down

Reuse the private-ip of this proxy node for DSX installation.

NOTES:

  • For testing purposes, the RHEL servers can use 2 separate partitions that are not used by the operating system installation with a minimum 150 GB and 350 GB
  • All the IP’s in the cluster must be in same subnet

Systems are set for the installation. Let us move to the steps related to DSX installer. This section describes installation by using the command line and configuration file. “wdp.conf” is the configuration file that will have all the required parameters.

Sample configuration file used:

# Warning: This file generated by a script, do NOT share
user=root
virtual_ip_address=
node_1=
node_data_1=/data
node_path_1=/ibm
node_2=
node_data_2=/data
node_path_2=/ibm
node_3=
node_data_3=/data
node_path_3=/ibm
ssh_port=22
overlay_network=<>

NOTE: The virtual_ip_address is either any unused IP address or the IP address that is obtained from the proxy node

Command line installation of DSX using wdp.conf

In master-1 node:

  1. Create the configuration file ( wdp.conf ) under /ibm folder.
  2. Download and copy the DSX installer ( DSX-Local-Build-Config.ppc64le.* ) from the appropriate location under same /ibm folder.
  3. Start the installation for the three-node cluster:
    # cd /ibm
    # chmod +x DSX-Local-Build-Config.ppc64le.117
    # ./DSX-Local-Build-Config.ppc64le.117 --three-nodes

While running the installer, it detects the wdp.conf from the same folder and will be prompted to use this configuration file. Press “y” here. Accept the terms and conditions and proceed. The installer might prompt to enter root password for all the nodes.

The installer detected a configuration file. Do you want to use the parameters in this file for your installation?
[Y/N]: y
Validating the information in the file...SUCCESS
By typing (A), you agree to the terms and conditions: http://www14.software.ibm.com/cgi-
bin/weblap/lap.pl?la_formnum=&li_formnum=L-KLSY
AF9UXF&title=IBM+Data+Science+Experience+Local+Enterprise+Edition&l=en
Type (R) if you do not agree to the terms and conditions
Please type (A) for accept or (R) for reject: A
Thank you for using IBM Data Platform on Private Cloud
Installer is preparing files for the initial setup, this will take several minutes...
Initial setup starts, log file will be located at /ibm/InstallPackage/tmp/wdp.2017_11_09__04_08_16.log
Docker client is not found. Installer is installing and starting docker via yum
Checking if the docker daemon is running
Clean up the old images and containers if any exist
Load the wdp docker image (1/2)

If the time across the nodes is not synchronized, the following message will be displayed. Manually sync-up time on all three nodes and press enter in such cases.

All the nodes are not synced to the same NTP server
Warning: Continuing without time synchronization among the nodes will cause unexpected issues
===== NTP configuration summary =====
x.x.x.x is synced to NTP server x.x.x.x,
x.x.x.x System clock is not synced
x.x.x.x System clock is not synced
Please configure NTP properly and press Enter to continue

When the installation completes successfully, URL for DSX Local client will be displayed.

The installation completed successfully.
Please visit https://x.x.x.x/dsx-admin for DSX portal

Change the IP to public for external connect

For the cluster, since private IPs were used, change the IP to public for external connect.

# cd /wdp/k8s/dsx-local-proxy/k8s
# cp nginx-service.yaml nginx-service.yaml.orig

Then, edit nginx-service.yaml and change the IP you see in the file to the public IP of the master-1. Run the following:

# kubectl delete -f nginx-service.yaml.orig --namespace=ibm-private-cloud
# kubectl create -f nginx-service.yaml --namespace=ibm-private-cloud

Now it should be able to connect with the public IP of the first master node.
On ppc64le, run this command also to make it accessible.

#iptables -P FORWARD ACCEPT

Login to DSX Portal

Installation and configuration of DSX is completed. Now sign in to the URL address that is similar as shown above “https:///dsx-admin” with admin/password credentials.

Trouble shooting guide for DSX install on Power

The following are some common errors during DSX install with solutions listed below each of them:

  1. ERROR: Disk latency test failed. By copying 512 kB, the time must be shorter than 60s, recommended to be shorter than 10s, validation result is 95s

    Solution: Ensure that your servers meet the hardware and software requirements for DSX Local. Refer https://datascience.ibm.com/docs/content/local/requirements.html to understand system requirements. The above means the node is not acceptable for the install. In such cases, by-passing the latency checks and doing forceful installation will result in cluster which will have issue due to slowness of the disk.

  2. WARNING: Disk throughput test failed. By copying 1.1 GB, the time is recommended to be shorter than 5s, validation result is 26s
    WARNING: NTP/Chronyc is not setup
    WARNING: CPU cores are 4, while requirement are 8
    Solution: Add suppress_warning=true in wdp.conf to skip warnings which can be really ignored.
  3. ERROR: Kubernetes is already installed with a different version or settings, please uninstall Kubernetes
    Solution: The installation script takes care of installing kubernetes and other required packages. Please uninstall any installed version if done.
  4. Pre-install script timeout, trying again
    Solution: After installer is extracted, scripts under “InstallPackage” takes care of doing pre-install check to verify if system requirements are met. Specifically “parse.sh” verifies the connection to all nodes. Above message indicates either connection to the nodes are not proper or the nodes is not acceptable for install since it didn’t meet the install requirements. Check the log file under tmp folder to see what is happening here.
  5. “error: error validating \”/wdp/create_calico/calico.yaml\”: error validating data: Get
    http://localhost:8080/swaggerapi/api/v1: dial tcp 127.0.0.1:8080: getsockopt: connection refused;

    “The connection to the server localhost:8080 was refused – did you specify the right host or port?”

    Solution: This comes from kubernetes because of cgroup driver difference. In docker version 1.12.6 , kubelet service has issues because of cgroup option difference.

    System logs shows message as “kubelet cgroup driver: “cgroupfs” is different from docker cgroup driver: “systemd”. Check the system logs/dmesg to see similar error from kubelet if any. In the later version of docker (17.03), this option is modified. The DSX installer takes care of installing proper docker and kubernetes. Please uninstall version of docker if it was done before installation and installer will pick up right version according to the distro.

  6. Selinux should be in permissive
    Solution: The installer requires selinux to be permissive mode. This can be achieved by modifying /etc/sysconfig/selinux or running:

    #setenforce 0
  7. NTP/Chronyc is not setup
    Solution: Date/time on all the nodes should be synchronized. Manually synchronize and proceed with the
    installation.
  8. Retry or skip any installation step
    In total, there are around 63 steps and installer allows to retry after correcting the stuff in background if some step fails. This is during the installer runtime. It also allows to resume installation from a specific step by adding “jump_install=” in wdp.conf . This will be helpful if someone got disconnected from any step and want to continue from the same point. But make sure the environment is not altered or cleaned up.
  9. Uninstall DSX
    From the Install folder run, /wdp/utils/uninstall.sh

References

The following web references has information about technologies referred in this blog DSX install:

  1. https://datascience.ibm.com/docs/content/local/requirements.html
  2. https://datascience.ibm.com/docs/content/local/install.html?context=analytics
  3. https://datascience.ibm.com/docs/content/local/troubleshootinstall.html?linkInPage=true

Acknowledgments

I would like to thank Suchitra Venugopal, co-author of this blog, for the contributions. We would like to thank Kanda Zhang, Omer Kamal and Manjunath Kumatagi for their guidance and right help with the issues during DSX Installation.

We would like to thank Poornima Nayak, Pradipta Banerjee, Indrajit Poddar, GopiKrishnan Gopi for encouraging to work on this blog and providing review comments. And we would like to extend our thanks to the members who worked on building the product mainly Igor Khapov, Yulia Gaponenko, Konstantin Maximov, Ilsiyar Gaynutdinov, Ekaterina Krivtsova, Alanny Lopez, Shilpa Kaul, Champakala Shankarappa, and Anita Nayak.

Join The Discussion

Your email address will not be published. Required fields are marked *