2021 Call for Code Awards: Live from New York, with SNL’s Colin Jost! Learn more

H2O inferencing with C++ MOJO on IBM Power Systems

Introduction

In a previous tutorial, Optimize performance of H2O Driverless AI inferencing on IBM Power Systems, we discussed how to improve the performance of H2O Driverless AI using a Java MOJO (Model Object, Optimized) scoring pipeline. In continuation, this tutorial describes how a user who performs inferencing with C++ MOJO on Power Systems can obtain results quickly by parallelizing the work using multiple Python processes.

Prerequisites

Familiarity with training a model with H2O Driverless AI would help as this is the requirement to be able to do inferencing using a C++ MOJO.

Hardware/Software

  • An IBM Power System S922 server with 256 GB memory
  • RHEL 7.6 and H2O Driverless AI (only for training the model)
  • An H2O Driverless AI license to performance inferencing with C++ MOJO

You can find prerequisites for installing H2O Driverless AI at: http://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/install/ibm-power.html#ibm-power-installs

The data set example used in this tutorial is Walmart with a size of about 43 million rows for the training and 11 million rows for the test and can be found at: https://s3.amazonaws.com/h2oai-power-benchmarks/timeseriesdata.tar.gz

The requirement for disk space depends on factors, such as the size of the input and output data from inferencing, and the number of experiments that need to be saved for later review.

Estimated time

The estimated time to improve the performance of running pipeline scoring jobs using a C++ MOJO on IBM Power Systems is around one to two hours. The time might vary depending on the size of the data.

Steps

Users who need to perform inferencing with C++ MOJO on Power Systems must complete the following tasks to obtain results quickly by parallelizing the work using multiple Python processes.

Step 1. Download C++ MOJO and prepare the scripts for inferencing

To download C++ MOJO, first log in to the H2O Driverless AI web interface. Click EXPERIMENTS. On the EXPERIMENTS page, select the experiment for which you need to run inferencing and click DOWNLOAD MOJO SCORING PIPELINE as shown in the following figure.

figure 1 View image larger

Notice that the MOJO Scoring Pipeline instructions are displayed.

figure 2 View image larger

Step 2. Run and tune C++ MOJO

For C++ MOJO, it is required to download both Java MOJO and the Python wrapper to C++. To download Java MOJO, follow the instruction (in step 1) outlined by H2O Driverless AI (which is unzip the downloaded mojo.zip). The mojo-pipeline file is used by the Python wrapper script.

To download the Python wrapper to C++, click DOWNLOAD PYTHON SCORING PIPELINE and perform the tasks as shown in the following figure.

figure 3 View image larger

The example provided by H2O Driverless AI MOJO scoring pipeline uses a single Python process. Using a single Python process on an IBM POWER processor-based system with several CPU cores results in significant underutilization of the system resources. It takes a considerable amount of time to complete the scoring pipeline if only one Python process is used. To take advantage of the multiple cores that are idle in the POWER processor-based system and reduce the time taken for inferencing, you can run multiple Python processes. We have provided a sample script in this tutorial, run-python.sh, that runs multiple Python processes. This script includes the following functions:

  1. Split the input file into multiple files. Splitting the file into multiple files allow the user to parallelize the work.
  2. Run multiple Python processes, each using a single process working on one of the split files.
  3. Combine the split output files into a single file.

Depending on the number of cores and memory available to perform inferencing, a user can split the input file into multiple files. In lab experiments a POWER9 processor-based system with 40 cores and 256 GB memory was used. Before splitting the input file, a baseline measurement was done using one Python process and the entire input file. After that, we split the input file into 8, 16, 32, and 40 files. The performance improved as more input files were used to run the C++ MOJO in parallel. In our case, we achieved near-linear scaling in performance with up to 32 input files and 32 Python processes.

The following graph shows the relative performance as the number of Python processes are increased.

Figure 4

To run the C++MOJO, use the following command:
./run-python.sh <csv-input-file>.csv <number of PYTHON processes> <mojo file>

Where, csv-input-file is the name of the input file, number of PYTHON processes is the number of PYTHON processes to be used and mojo file is the name of the MOJO generated by H2O driverless AI.

For example:
./run-python.sh test_data.csv 8 pipeline.mojo

Refer to the following sample run-python.sh script.

run-python.sh


#!/usr/bin/env bash

function split_file () {
FILENAME=$1
NUMBEROFILES=$2
head -1 $FILENAME >heading
LINES=$((`wc -l $FILENAME|awk '{print $1}'`-1))
tail -$LINES $FILENAME>new.csv
LINES2=$((`wc -l new.csv|awk '{print $1}'`/$NUMBEROFILES))
REMIND=$((`wc -l new.csv|awk '{print $1}'`%$NUMBEROFILES))
split -l$LINES2 new.csv -d
for i in $(seq 00 $((NUMBEROFILES-1))); do
        if [ $i -lt 10 ];then
                DG=0$i
        else
                DG=$i
        fi
cat heading x$DG>prod_split$i.csv;done
if [[ $REMIND -gt  0 ]]; then
   if [ $i -lt 10 ]; then
        DX=0$((i+1))
      cat  x$DX >> prod_split$i.csv
    else
        cat x$((i+1)) >> prod_split$i.csv
    fi
fi
}

#-------------------------------------

function prod_all () {

MOJO_FILE=$1

 NUMBEROFILES=$2

for i in $(seq 00 $((NUMBEROFILES-1))); do
        INPUT_FILE="${3:-prod_split$i.csv}"
        OUTPUT_FILE="${4:-sc_$i.csv}"
        LICENSE_FILE="${5:-/root/.driverlessai/license.sig}"

/h2o/dai-1.8.8-linux-ppc64le/dai-env.sh python run.py ${INPUT_FILE} ${OUTPUT_FILE} &
done
wait

}

#-------------------------------------
function join_files () {

rm -f heading score_new.csv score_noheading.csv
FILENAME=sc_0.csv
head -1 $FILENAME>heading
NUMBEROFILES=$1

for i in $(seq 00 $((NUMBEROFILES-1))); do
        cat sc_$i.csv>>score_new.csv
done
cat score_new.csv | egrep -v "`cat heading`" >score_noheading.csv
cat heading score_noheading.csv >score_all.csv
rm -f x* prod_split*csv sc_*.csv heading new.csv score_new.csv score_noheading.csv

}
#------------------------------------- Main
#   Usage: run-python.sh  < input file > < # of python processes > < mojo file >
#   Description:  This script runs inferencing
#
[ $# -ne 3 ] && { echo "Usage: run-python.sh  <input file> <#of python processes> <mojo file> "; exit 1; }


split_file $1 $2
prod_all $3 $2
join_files $2

Summary

This tutorial describes how to improve the performance of running pipeline scoring jobs using a C++ MOJO on IBM Power Systems. This is accomplished by splitting the input data set and running a separate Python process for each of the partitioned data sets. In lab experiments, it was observed that the performance improved by over 10 times as the number of Python processes were increased.