Rebuilding Python for improved H2O Driverless AI performance on IBM Power Systems

Introduction

The Python interpreter is an important part of many applications, and as such, it’s run-time performance is gaining focus. The Python interpreter that is built and included with H2O Driverless AI for IBM® Power Systems™ is not fully optimized, resulting in lower performance. This tutorial explains how rebuilding the Python interpreter with the latest gcc and correct options can result in significant performance improvements.

Tests with IBM Advance Toolchain for Linux on Power version 12.0 show average performance gains of 10% to 40%.

Estimated time

Users should allocate 1 to 2 hours of time to rebuild the Python interpreter.

Prerequisites

Python needs several developer packages installed. Here is a partial list (individual systems might require more to be installed).

Run the following command to install the required packages:

# yum install openssl-devel readline-devel ncurses-devel bzip2-devel gdbm-devel libsqlite3x-devel zlib-devel lzma-sdk-devel tk-devel libffi-devel sqlite-devel xz-devel

Steps

The version of Python that is included with H2O Driverless AI is built without some optimization flags that are provided by the Python build environment. Building the Python interpreter with those flags and with the latest gcc version available in the Advance Toolchain for Linux on Power provide improved performance for workloads on the IBM Power® platform.

Step 1. Set up the environment variables

To help with the rest of the build, you should export the following environment variables. The H2O_PYTHON_LOCAL environment variable is where you would like the built Python files to be installed. The DAI_INSTALL environment variable is the location of your H2O Driverless AI installation. For example:

# export H2O_PYTHON_LOCAL=/opt/python364-at12-dai186
# export DAI_INSTALL=/home/h2o/dai-1.8.6-linux-ppc64le

Step 2. Install Advance Toolchain for Linux on Power

First, make sure that you have the latest version of Advance Toolchain for Linux on Power installed on your system. See https://developer.ibm.com/linuxonpower/advance-toolchain/advtool-installation/ for instructions. At the time of writing this tutorial, Advance Toolchain 12.0 is the latest version for RHEL 7 releases.

Step 3. Download the Python source

In order to properly rebuild the Python interpreter for the H2O Driverless AI code and have the proper optimizations take place, download the appropriate source level from the main python.org website. The source is listed on the python.org page at https://www.python.org/downloads/. Download the required version. H2O Driverless AI 1.8 is included with Python 3.6.4. To determine the current level that you are using, invoke Python with the -VVV flag.

# ${DAI_INSTALL}/dai-env.sh python -VVV
======================================================================
DRIVERLESS_AI_HOME is /home/h2o/dai-1.8.6-linux-ppc64le
DRIVERLESS_AI_CONFIG_FILE is /home/h2o/dai-1.8.6-linux-ppc64le/config.toml
DRIVERLESS_AI_JAVA_HOME is /home/h2o/dai-1.8.6-linux-ppc64le/jre
DRIVERLESS_AI_DATA_DIRECTORY is ./tmp
JAVA_HOME is /home/h2o/dai-1.8.6-linux-ppc64le/jre
DRIVERLESS_AI_CUDA_VERSION is cpu-only
DRIVERLESS_AI_H2O_XMX is 65536m
DRIVERLESS_AI_H2O_PORT is 12348
DRIVERLESS_AI_PROCSY_PORT is 12347
DRIVERLESS_AI_VIS_SERVER_PORT is 12346
OMP_NUM_THREADS is 16
OPENBLAS_MAIN_FREE is 1
SSL_CERT_DIR is /etc/pki/tls/certs
LANG is en_US.utf8
LC_ALL is en_US.utf8
MAGIC is /home/h2o/dai-1.8.6-linux-ppc64le/share/misc/magic
no_proxy is localhost,127.0.0.1
NO_PROXY is
HOME is /root
USING_DRIVERLESS_AI_ENV is 1
uid=0(root) gid=0(root) groups=0(root)
======================================================================
Python 3.6.4 (default, Jan 28 2020, 20:24:58)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]

Download and unpack the required Python level:

# cd /root
# wget https://www.python.org/ftp/python/3.6.4/Python-3.6.4.tgz
# tar -xf Python-3.6.4.tgz

Step 4. Rebuild the Python source

Perform the following steps to configure and build the Python source, and install the newly built Python to a new directory.

  1. For the profile generated optimizations to work, copy the tests from the Python source to the H2O Driverless AI source directory.

    # cd ${DAI_INSTALL}/src/Python-3.6.4
    # cp -r /root/Python-3.6.4/Lib/test ${DAI_INSTALL}/src/Python-3.6.4/Lib

  2. Configure and build the Python code.

    # ./configure CC="/opt/at12.0/bin/gcc" CXX="/opt/at12.0/bin/g++" --with-lto \
    --enable-optimizations --enable-ipv6 --enable-loadable-sqlite-extensions \
    --prefix=${H2O_PYTHON_LOCAL} --exec-prefix=${H2O_PYTHON_LOCAL}
    # make clean profile-removal
    # make -j 40
    # make install
    # ln -s python3 ${H2O_PYTHON_LOCAL}/bin/python
    # ${H2O_PYTHON_LOCAL}/bin/python -m pip install wheel virtualenv
    
  3. If there are additional prerequisites needed based on error messages in the configure or make step, install and repeat the above steps as appropriate.

Note: You can find more information about the configure flags using the help option.

# ./configure --help

Step 5. Update the Driverless AI environment

Next, you need to update the scripts for the H2O Driverless AI environment. Create a new LOCAL_PYTHON version of the dai-env and run-dai scripts, and modify those scripts so that H2O Driverless AI uses the new Python.

# cd ${DAI_INSTALL}
# export PYPATH=`./dai-env.sh python -c "import sys; print(':'.join(x for x in sys.path))"|grep ^:`
# perl -ne 'print; if (/^export LD_LIBRARY_PATH=/) { print "\
 ####\
 LOCAL_PYTHON=$ENV{'H2O_PYTHON_LOCAL'}
 export PYTHONPATH=\"\$\{LOCAL_PYTHON\}/lib/python3.6/site-packages:\$\{LOCAL_PYTHON\}/lib/python36.zip:\$\{LOCAL_PYTHON\}/lib/python3.6:\$\{LOCAL_PYTHON\}/lib/python3.6/lib-dynload:\\
$ENV{'PWD'}$ENV{'PYPATH'}\"\
 export PATH=\"\$\{LOCAL_PYTHON\}/bin:\$\{PATH\}\"\
 export LD_LIBRARY_PATH=\"\$\{LOCAL_PYTHON\}/lib:\$\{LD_LIBRARY_PATH\}\"\
 ####\
"}; ' dai-env.sh > dai-env_LOCAL_PYTHON.sh
# chmod +x dai-env_LOCAL_PYTHON.sh

# cp run-dai.sh run-dai_LOCAL_PYTHON.sh
# perl -pi -e 's/dai-env.sh/dai-env_LOCAL_PYTHON.sh/g;' run-dai_LOCAL_PYTHON.sh

Step 6. Using the new build

After the build is complete, use the new run-dai_LOCAL_PYTHON.sh script to start H2O Driverless AI.

Summary

Rebuilding Python on your IBM Power hardware with the latest gcc from the Advance Toolchain for Linux on Power and using the optimization flags in the Python build environment results in a Python interpreter that has higher performance than the installed version.