IBM Spectrum LSF is a workload manager that provides support for traditional high-performance computing (hpc) and high throughput (htc) workloads, as well as for big data, cognitive, GPU machine learning, and containerized workloads. LSF itself can run on bare metal, within a VM or within a container. This blog discusses how to achieve the latter.

Many people view containerisation as a more lightweight form of virtualisation. When you create a VM it needs to have configuration information on corporat DNS, LDAP, file systems to mount etc – if the VM just contains LSF on its own, (ie no applications either on local virtual disk or mounted) then there is nothing for LSF to schedule. The VM is only really good for examining LSF functionality, but not really useful for production workload.

The same is true with containerisation – a container that contains just LSF can’t run any applications itself. ICP provides a containerised version of LSF Community Edition, which is great for evaluating LSF capabilities, but the container lacks the integration with necessary external services (e.g. ldap, nfs) and applications to run. This blog discusses how to build a production ready container for use with ICP that contains LSF and the applications you want to run.

In this blog we are going to look at how to run IBM Spectrum LSF Suite 10.2 Enterprise Edition on an IBM Cloud Private 2.1.0.2 cluster. IBM Cloud Private comes with a helm chart to install IBM Spectrum LSF Suite 10.1 Community Edition. The Community Edition has the Application Center, but is missing the Process Manager, Data Manager, License Scheduler, Resource Connectors, Explorer, as well as the MPI’s. It also has restrictions on the maximum number of machines, and the maximum number of jobs in the queues. The Spectrum LSF Suite 10.2 Enterprise Edition is unlimited.

The other benefit of this approach is that we get to modify the software stack in the image. This way we can pre-load the applications that we want to run into the containers. We can also setup access to other datacenter systems that we will need, such as LDAP.

Prerequisites

To start with you will need the following:

  • IBM Cloud Private 2.1.0.2. Newer versions should also work. We’ll start with the assumption that this is already installed
  • IBM Spectrum LSF Suite 10.2. Workgroup, HPC or Enterprise Editions are fine
  • The Dockerfile and associated scripts. They are available here.
  • A spare machine, or VM for a test installation of LSF Suite
  • Internet connectivity to pull an OS docker image to build on

Procedure

We need to get the Spectrum LSF Suite software into a container. Sounds simple enough, but is a little more complex given how containers work. We could try running a container and using the LSF Suite Installation to do the deployment into the container. To make that work we’d need to get passwordless ssh running in the container and hostname resolution working. The installation will also try to start various services through “systemctl”. Those will typically fail, unless we have managed to start systemd. Running systemd in a container seems to have its challenges, so we’ll opt for a different method to get the installed files.

Gathering the Spectrum LSF Suite files for the Docker Image

Login to your spare machine, from the prerequisites above, copy the Spectrum LSF Suite 10.2 *.bin file to it. Run the installer e.g.

# ./lsfsent10.2.0.0-x86_64.bin

You can use the Workgroup or HPC packages as well.

You’ll need to accept the license agreement, but other than that it will not prompt for any other information. If you are doing this on a POWER machine, just use the POWER *.bin file. Once that is done we can install the Spectrum LSF Suite 10.2 on this machine using the following:

# cd /opt/ibm/lsf_installer/playbook

Edit the lsf-config.yml file and set “Enable_Monitoring: False”. IBM Cloud Private has a lot of monitoring already so we can rely on that instead. Also uncomment the line:

Secondary_LSF_ADMINS: {list users that can administer LSF here}

Save the file. Now we will test this machine to see if the installation prerequisites for deploying Spectrum LSF Suite are met. Run the following:

# ansible-playbook -i lsf-inventory lsf-predeploy-test.yml

You can ignore errors about memory, but others need to be resolved. You may need to add the contents of the /root/.ssh/id_rsa.pub to the /root/.ssh/authorized_keys file. Next we will do the actual installation by running:

# ansible-playbook -i lsf-inventory lsf-deploy.yml

It will take a few minutes. Once it is done we can collect the files it installed by running:

# cd /opt
# tar zcvf /var/www/html/LSF-Suite-10.2-install-x86.tgz ibm –exclude "/opt/ibm/lsf_installer"

NOTE: The path prefix “/var/www/html” is intentional.

It will create a tar file which we will use later to create the docker image.
Optionally, to clean out the LSF Suite installation on this machine run the following:

# cd /opt/ibm/lsf_installer/playbook
# ansible-playbook -i lsf-inventory lsf-uninstall.yml

This will uninstall the LSF Suite rpms. To remove the Spectrum LSF Suite rpm repository and Ansible playbooks run:

# rm –rf /opt/ibm/lsf_installer /var/www/html/lsf_suite_pkgs

We need to build the Docker image next using this tar file we have generated. As part of the build process we copied the tar file to /var/www/html, this is because this machine is running a web server, and we are going to use it to download the tar file when creating the docker image.

Building the Docker Image

Download the Dockerfiles and helm charts from here. Click on hte “Clone or Download” button and select Download Zip. Extract the zip file. You will see the following in the resulting icp-lsfsent directory:

-rw-r–r–. 1 root root 3075 Apr 3 2018 Dockerfile.ppc64le
-rw-r–r–. 1 root root 3075 Apr 3 16:54 Dockerfile.x86_64
drwxr-xr-x. 3 root root 4096 Mar 2 16:09 helm
-rw-r–r–. 1 root root 1001 Apr 3 2018 Makefile
drwxr-xr-x. 2 root root 4096 Jan 30 11:54 scripts
-rwxr-xr-x. 1 root root 13350 Feb 23 12:41 start_lsfsent.sh

Step 1: Preparing the Dockerfile

We need to get the URL for the tar file that we created in the previous step. It will be something like:

http://{IP address of machine above}/LSF-Suite-10.2-install-x86.tgz

Edit the Dockerfile.{Architecture} file and change the following entries:

ENV HTTP_SRV=http://10.10.10.1 (Change this to the IP of your web server host from above)
ENV TARFILE=LSF-Suite-10.2-install-on-host87f2.tgz (Change this to the name of your tar file)

Step 2: Building the Image

We start by getting a base OS image to run from. I’ll use CentOS as it’s free and supported by Spectrum LSF Suite 10.2.

# docker pull centos:latest

Lets start the image to make sure it works:

# docker run –entrypoint "" -ti centos:latest /bin/bash
[root@f1f2a48e6a63 /]# df -h
:

We should see the containers filesystem. While we are here let’s look at the /etc/passwd, and /etc/nsswitch.conf. It’s not like the host OS’s files, and this is something we have to deal with later.
To build the image just run:

# make image

This will build an image with the name: lsfsent and a tag of 10.2-{Machine Architecture}
If you want to change this edit the Makefile and change the “VER” and “IMAGENAME” variables, and rebuild the image, or just tag it.
If you have build this image on a non IBM Cloud Private machine you will need to copy it to the IBM Cloud Private image repository, or copy it to the worker nodes local cache.

Step 3: Build the Helm repository

The helm charts to deploy the LSF Suite cluster with ICP are located in the helm directory. You are free to customize these any way you like. If you changed the image name or tag above then remember to edit the helm/ ibm-lsfsent-dev/values.yaml file and update the image variables accordingly.
Edit the Makefile and change the IP address of the web server, as seen in this line:

REPO=http://10.10.10.1/helm-repo

We’ll use this URL later when we add this repository to IBM Cloud Private. You will need the “helm” command for this. The IBM Cloud Private documentation explains how to get the kubectl, and helm CLIs.
Now build the repository files by running:

# make repofiles

It will generate two files:

  • ibm-lsfsent-dev-1.0.1.tgz
  • index.yaml

You will need to re-do this every time you modify the contents of the chart.
Copy these to files to the web server above and put them into the: /var/www/html/helm-repo directory. Check that you can access the files from you Docker host by running the following in a different directory:

# cd /tmp
# wget http://10.10.10.1/helm-repo/index.yaml

It should be able to download the file.

Adding the Repository to IBM Cloud Private

Next we will add the repository we just created to IBM Cloud Private. Login to IBM Cloud Private and locate the repositories. Add a new one, using the URL for the REPO above. Remember to Synchronize the repositories.
Next go the Charts and type “LSF” in the search bar. You should see the LSF Chart in repository you just added. Follow the instructions to deploy the chart. Remember to tag the worker machines with “deploy_lsf=true” e.g.

# kubectl label nodes {IP address of worker node} deploy_lsf=true

Repeat this for all the machines you want to run the Spectrum LSF Suite Cluster on.
Once the chart is deployed you can login to the LSF Master node, and look around. At this stage you should have a working LSF Suite cluster, but you’ll quickly notice that it is not integrated with the rest of the datacenter systems. This is where it gets a bit harder.

Integrating with Datacenter Systems

To make Spectrum LSF Suite deployed by IBM Cloud Private more useful it’s necessary to customize the both the Docker image and the helm charts. We need to add any packages that the workloads will need. We will also need to add any packages needed to enable user authentication and perhaps host resolution. We may also need to start some daemons for user authentication, like sssd. Lets start by adding some additional packages to the image. To do that we need to modify the Dockerfile. Edit the file and look for:

RUN yum -y install openssh-server wget gettext net-tools which sssd sysstat mysql-connector-java (add additional packages here) \

Add the additional packages to the list. Then run:

# make image

To rebuild the image with the new list of packages. That gets the binaries into the image, but what if some of those services require daemons to be running in order to work? In that case we have to handle starting them. We have a single entrypoint called “start_lsfent.sh”. This is invoked when the container starts and handles starting the various LSF daemons, as well as preparing the cluster configuration. Since there is only one entrypoint we need to look at extending that file to start any deamons that you need. You will need to edit the script to inject your code. I’d recommend letting LSF start first, then start any daemons you need. After editing the file you will need to rebuild the image again.

Testing the changes can be a little laborious. The easiest way to handle it is to start a container outside of IBM Cloud Private and debug the container from there. To do that run the following:

# docker run -ti –net default lsfsent:10.2-x86_64 master password

If the container starts then exits, there is an error in the entrypoint script that is causing the script to exit. If it keeps running you can connect to it and debug it with:

# docker exec -ti {Container ID} /bin/bash

So that’s how we prepare the docker image, but if we need to modify the deployment we will need to look at how to change the helm charts.

Modifying the Deployment

Here we will look at how we change the deployment of the Spectrum LSF Suite cluster. The picture below shows the structure of the deployment.

There are three containers:

  • LSF Master – This runs the LSF Master Daemons, as well as the GUI processes
  • Mariadb – This runs the database for the GUI
  • LSF Compute – These are the worker nodes for running jobs. By default the LSF Master can also run jobs.

The containers have access to different storage volumes. The configuration can be found in the helm chart. Look at: icp-lsfsent/helm/ibm-lsfsent-dev/values.yaml. They are:

  • “pvc”

    This is a dynamic Persistent Volume Claim (PVC) that is created when the cluster is deployed. It is used to hold the LSF Suite configuration files. A Persistent Volume (PV) is required to generate the PVC in. This is mounted as “/home” in the container. It inherits this configuration from the LSF Suite Community Edition for IBM Cloud Private, and is something that would need to be changed to allow real users to access their home directories.

  • “DataPVC”

    This is an optional PVC that is intended to provide a means of accessing data. To use it set connectDataPVC to true, and set dataMntPnt to where you want to mount it.

  • “AppPVC”

    Is similar to the DataPVC and is intended to provide the means to access applications in the container.

  • “sssdPVC”

    Is optionally used to provide the contents of the /etc/sssd directory in the container, and may also be used to provide the certificates for other services.

For a production configuration of Spectrum LSF Suite, it is probably desirable to modify all of these. The “sssdPVC” could be replaced by building all of the parts needed for user authentication into the container. The “pvc”, “DataPVC” and “AppPVC” could perhaps best be handled better for production using hostpaths. Sample configuration is here.
This configuration would go in the: icp-lsfsent/helm/ibm-lsfsent-dev/templates/deployment.yaml file.

Testing changes to the Chart

After the chart has been modified we will need to test it. Even before testing it’s a good idea to lint the chart before deploying. This will catch some of the obvious errors that can happen. Run the following:

# cd icp-lsfsent/helm

# helm lint –strict ./ibm-lsfsent-dev

If that is clear you can deploy with:

# helm install –debug ./ibm-lsfsent-dev

You can also deploy it from within IBM Cloud Private. Use the procedure above to rebuild the repository files, then within IBM Cloud Private delete the repository and re-add it.

Conclusion

While it is possible to run LSF Suite inside of IBM Cloud Private, the process to make that a useful configuration takes an investment of time and effort. Spectrum LSF Suite is providing the middleware for running the applications, but it is up the IBM Cloud Private administrator to make sure the environment created inside the containers has access to all the users data and applications.

The layering of the files/directories may not provide the best performance, especially if the data or applications are part of the image. It maybe better to flatten the file structure of the image. If space is a concern the Management machines and the workers could have different images.

Join The Discussion

Your email address will not be published. Required fields are marked *