Optimize performance of H2O Driverless AI inferencing on IBM Power Systems – IBM Developer

Build cloud-native applications for regulated workloads with IBM Cloud for Financial Services Learn more

Optimize performance of H2O Driverless AI inferencing on IBM Power Systems

Introduction

On IBM® Power Systems™ with H2O Driverless AI, after a user completes the machine learning training on a model, H2O Driverless AI offers the capability to obtain insight into the data. This process is called inferencing or scoring.

With H2O Driverless AI, you can perform inferencing by downloading the MOJO (Model Object, Optimized), which is a Java™ based scoring pipeline. In lab experiments, using an IBM POWER9™ processor-based system, the performance during inferencing improved by a magnitude of over 25 times by using a MOJO scoring pipeline. This tutorial describes how to improve the performance of the inferencing phase using a MOJO scoring pipeline on Power Systems.

Prerequisites

The following list contains the hardware and software used to perform the experiments:

  • IBM Power® System S922 server with 256 GB of memory
  • Red Hat® Enterprise Linux 7.6
  • H2O Driverless AI (only for training the model)

We recommend that you are familiar with H2O Driverless AI and Java. The assumption is that you are familiar with how to train a model as this is required before inferencing can be done.

You can find prerequisites for installing H2O Driverless AI at: http://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/install/ibm-power.html#ibm-power-installs

The data set used in the example below is Walmart with a size of about 43 million rows for the training and 11 million rows for the test and can be found at https://s3.amazonaws.com/h2oai-power-benchmarks/timeseriesdata.tar.gz

The requirement for disk space will depend on factors such as the size of the input data and the output data from inferencing, and the number of experiments that need to be saved for later review. For general sizing information for H2O Driverless AI experiments, refer to the document at https://docs.h2o.ai/driverless-ai/1-8-lts/docs/userguide/install/ibm-power.html

An H2O Driverless AI license is required to perform inferencing.

Estimated time

Optimizing the performance of H2O Driverless AI inferencing on IBM Power Systems (that includes training and inferencing) takes about one to two hours. The time might vary depending on the size of the data.

Steps

1. Download MOJO and prepare the scripts for inferencing.

To download the MOJO, first log in to the H2O Driverless AI web interface. Click EXPERIMENTS. On the EXPERIMENTS page, select the experiment you would like to run inferencing on and click DOWNLOAD MOJO SCORING PIPELINE as shown in the following figure.

Figure 1

Notice that the MOJO Scoring Pipeline instructions are displayed.

Figure 2

2. Run and tune MOJO.

To become familiar about how to run inferencing, the first time you run MOJO, follow the instructions outlined by H2O Driverless AI:

  1. Unzip downloaded mojo.zip
  2. cd mojo-pipeline
  3. bash ./run-example.sh

The example provided by H2O Driverless AI utilizes a single Java Virtual Machine (JVM). Using a single JVM on an IBM POWER® processor-based system with several CPU cores results in significant underutilization of the system resources. Each JVM works on a single line in the input file at a time, and depending on the size of the input file the time taken to complete inferencing varies accordingly. To take advantage of the multiple cores that are idle in the POWER processor-based system and reduce the time taken for inferencing, we used multiple JVMs. We have provided a script that runs multiple MOJOs, each using a single JVM as follows:

  1. Split the input file into multiple files. Splitting the file into multiple files will allow the user to parallelize the work.
  2. Run multiple MOJOs each using a single JVM working on one of the split files.
  3. Combine the split output files into a single file.

Depending on how many cores and memory are available to perform inferencing, a user can split the input file into multiple files. In lab experiments a POWER9 processor-based system with 40 cores and 256 GB of memory was used. Before splitting the input file, a baseline measurement was done using one JVM and the entire input file. After that, we split the input file into 8, 16, 32, and 40 files. The performance improved as more input files were used to run the MOJO in parallel. In our case, we achieved linear scaling in performance up to 32 input files and 32 JVMs. For Java, we set the Java heap for each JVM and the maximum memory allocation pool to 6.25 GB as an example:

CMD_LINE="java -Xms6250M -Xmx6250M  -Dai.h2o.mojos.runtime.license.file=${LICENSE_FILE} -cp java_inference/mojo-pipeline/mojo-runtime.jar ai.h2o.mojos.ExecuteMojo"

The following graph shows the relative performance as the number of JVMs are increased:

Figure 3

To run the MOJO use: ./run_mojo.sh < input file > < # of JVMs > < mojo file >
For example, ./run_mojo.sh test_file.csv 32 pipeline.mojo

Run_mojo.sh


    #!/usr/bin/env bash

    function split_file () {
    FILENAME=$1
    NUMBEROFILES=$2
    head -1 $FILENAME >heading
    LINES=$((`wc -l $FILENAME|awk '{print $1}'`-1))
    tail -$LINES $FILENAME>new.csv
    LINES2=$((`wc -l new.csv|awk '{print $1}'`/$NUMBEROFILES))
    REMIND=$((`wc -l new.csv|awk '{print $1}'`%$NUMBEROFILES))
    split -l$LINES2 new.csv -d
    for i in $(seq 00 $((NUMBEROFILES-1))); do
        if [ $i -lt 10 ];then
            DG=0$i
        else
            DG=$i
        fi
     cat heading x$DG>prod_split$i.csv;done
     if [[ $REMIND -gt  0 ]]; then
     if [ $i -lt 10 ]; then
            DX=0$((i+1))
        cat  x$DX >> prod_split$i.csv
        else
            cat x$((i+1)) >> prod_split$i.csv
        fi
     fi
     }

     #-------------------------------------

     function prod_all () {

     MOJO_FILE=$1

     NUMBEROFILES=$2

     for i in $(seq 00 $((NUMBEROFILES-1))); do
            INPUT_FILE="${3:-prod_split$i.csv}"
            OUTPUT_FILE="${4:-sc_$i.csv}"
            LICENSE_FILE="${5:-/root/.driverlessai/license.sig}"
     CMD_LINE="java -Xms6250M -Xmx6250M  -Dai.h2o.mojos.runtime.license.file=${LICENSE_FILE} -cp java_inference/mojo-pipeline_week_1/mojo2-runtime.jar ai.h2o.mojos.ExecuteMojo"
     ${CMD_LINE} ${MOJO_FILE} ${INPUT_FILE} > ${OUTPUT_FILE} &
     done
     wait

     }

     #-------------------------------------
     function join_files () {

     rm -f heading score_new.csv score_noheading.csv
     FILENAME=sc_0.csv
     head -1 $FILENAME>heading
     NUMBEROFILES=$1

     for i in $(seq 00 $((NUMBEROFILES-1))); do
            cat sc_$i.csv>>score_new.csv
     done
     cat score_new.csv | egrep -v `cat heading` >score_noheading.csv
     cat heading score_noheading.csv >score_all.csv
     rm -f x* prod_split*csv sc_*.csv heading new.csv score_new.csv score_noheading.csv

     }



     #------------------------------------- Main
     #   Usage: run-mojo.sh  < input file > < # of JVMs > < mojo file >
     #   Description:  This script runs inferencing
     #
     [ $# -ne 3 ] && { echo "Usage: run-mojo.sh  <input file> <#of jvms> <mojo file> "; exit 1; }


     split_file $1 $2
     prod_all $3 $2
     join_files $2

Summary

This tutorial describes a procedure to improve the performance of running inferencing jobs on IBM Power Systems. This is accomplished by splitting the input data set and running a separate JVM to process each of the portioned data sets. In lab experiments, it was observed that the performance improved 8 to 28 times depending on the number of JVMs used compared to running a single JVM.