Setting up an artificial intelligence (AI) environment on IBM PowerVM virtualized IBM Power Systems

Introduction

As artificial intelligence (AI) is becoming mature, every industry wants to adopt it. Enterprises want to use it to unlock the hidden insight from data and use that to make strategic choices for companies. Many enterprises are continuously evaluating different use cases and experimenting with data using different AI frameworks. Having an infrastructure that can support different machine learning and deep learning (MLDL) frameworks is one of the challenges for enterprises in experimenting with AI. In many cases, it is helpful to be closer to data where you want to perform AI. This helps in security and faster movement of data from rest to work. Now you can easily run and experiment with ML/DL models on the IBM® Power Systems™ servers those run IBM AIX® (for example, IBM PowerVM® virtualized IBM Power Systems servers such as IBM Power® System E980, Power E950, and Power S924).


This tutorial describes how you can install the IBM Watson® Machine Learning Community Edition (WML CE) on Power Systems servers and some examples to experiment with it. This tutorial demonstrates how to perform non-accelerated inferencing against data on an IBM AIX® logical partition (LPAR) by using AI capabilities on Linux LPAR [also called a virtual machine (VM)] inside the same IBM Power Systems server. This helps in using a secure, high-speed, and low overhead data movement between AI and enterprise processing environments.

Installing WML CE

WML CE version 1.6.1 is built based on the former release of IBM PowerAI. It is a software toolkit for machine learning and deep learning frameworks with their dependencies. It is built for easy and rapid deployment of AI frameworks.

WML CE can be installed either by using the Conda repository or the Docker images. There are two versions of WML CE (one for GPU and one for CPU-only systems). In this tutorial, we use the CPU-only images for IBM Power Systems servers.

Refer to the following operating system requirement for installation of WML CE.

Red Hat Enterprise Linux (RHEL) 7.6 Little Endian (LE) for IBM POWER8® and IBM POWER9™ or Ubuntu 18.04.1 LTS for IBM Power.

To install WML CE on a PowerVM virtualized IBM Power Systems server, install one of the above mentioned operating systems on an LPAR. After setting up your Power Linux partition, use one of the methods explained in the following sections to install WML CE.

Install Using Conda

Perform the following steps to install WML CE using Conda:

  1. Install the basic packages for Anaconda installation. (this example is from RHEL and uses yum)
    # yum install wget nano bzip2

  2. Install Anaconda: To install WML CE using Conda images, first install the Anaconda environment in your system. Perform the following steps to install Anaconda on Linux LPAR.

    1. Download the Anaconda3 installation script.
      # wget https://repo.continuum.io/archive/Anaconda3-2019.03-Linux-ppc64le.sh

    2. Install Anaconda as root.
      # bash Anaconda3-2019.03-Linux-ppc64le.sh
      Accept the license agreement for Anaconda and specify the installation location. The default location is $HOME/anaconda3.

    3. After installation of Anaconda, initialize the environment and update the changes in current terminal by sourcing the .bashrc script.
      # source ~/.bashrc

    4. The base environment in Anaconda3 is Python 3.7 and WML CE is supporting Python 3.6. So, downgrade the Anaconda environment to Python 3.6 by running the following command.
      # conda install python=3.6

  3. Install WML CE: To install WML CE (PowerAI) in the Anaconda environment, first configure the channel for WML CE and then install the powerai-cpu image.

    1. Configure the Conda channel for PowerAI installation.
      # conda config --prepend channels \ https://public.dhe.ibm.com/ibmdl/export/pub/software/server/ibm-ai/conda/

    2. Install the powerai-cpu image (WML CE CPU-only image).
      # conda install powerai-cpu

During the Anaconda installation, the license agreement is presented. Read the license agreement and accept the terms and conditions to complete the installation. If you decline the license agreement, then the packages are not installed.

After you finish reading the license agreement, future installations can be automated to accept the license agreement by running the following command before running the Conda installation command:

# export IBM_POWERAI_LICENSE_ACCEPT=yes

After this, WML CE is installed and ready to be used in your environment.

Note: If your system is in an airgap (that is, with no access to external network) environment then you can create a local Conda repository that can serve your system. Refer: https://docs.anaconda.com/anaconda-repository/admin-guide/install/config/mirrors/mirror-anaconda-repository/.

Installing using Docker

Docker images are Ubuntu 18.04 based but can be installed on RHEL systems. Perform the following steps to install WML CE using a Docker image.

  1. Make sure that Docker is installed in your system, if not then first install it.
    # yum install docker

  2. Start the Docker demon.
    # service docker start

  3. Pull the powerai CPU docker image with all the packages from the PowerAI Docker hub. Install the CPU-only images. Otherwise, gpu is needed to run some of the frameworks.
    # docker pull ibmcom/powerai:1.6.1-all-cpu-ubuntu18.04-py3-ppc64le

  4. Run the downloaded Docker image.
    # docker run -ti --env LICENSE=yes ibmcom/powerai:1.6.1-all-cpu-ubuntu18.04-py3-ppc64le bash

After completing these steps, you will be inside the Docker that has most of the packages and AI frameworks installed and it is now ready for use. The Docker image is a Conda environment. If any package is missing in the image, you can install that using the conda install <package name> command inside Docker. For example, if the Python pandas package is not available in Docker, then you can just install pandas using the conda install pandas command inside Docker.

Note: If there is no internet connection to the system where you want to install the Docker image then there are couple of options.

Machine learning for AIX

The focus of this tutorial is to describe a scenario of accessing an AI model on Linux LPAR from the AIX LPAR using Representational State Transfer (REST) APIs. This helps AIX users to run predictions on the data present on the AIX LPAR. Model deployment is done on the Linux LPAR and inferencing using REST APIs is done from the AIX LPAR. The Linux and AIX LPARs are inside the same Power Systems server.

Model building and deployment

After installing WML CE on the Linux LPAR on Power Systems server, it is ready for building and deploying ML/DL models. After the training, model is generated and saved on the system. The model can also be built on an accelerated compute server (such as Power AC922 system) and it can be brought to the Linux LPAR.

The model which is ready for deployment on the Linux LPAR can be served using frameworks such as TensorFlow Serving, Flask, and so on. The deployed model exports REST APIs which can be called from the AIX LPAR to access the model and run predictions.

Model scoring/inferencing

The data can be retrieved from the database on the AIX LPAR and can be sent to the Linux LPAR for inferencing using the REST API. The existing applications running on AIX which are written in C/C++ or Java can easily incorporate or enable the REST APIs method to run inferencing on the data.

The model building, deployment, and inferencing are depicted in the Figure 1.

Figure 1. Inferencing on AIX data using the model on Linux in a PowerVM virtualized Power Systems server

fig1

Model training, deployment, and inferencing explained above are demonstrated using the following examples. You can download the source code for these examples from https://github.com/IBM/aix-ai-sample-wmlce

Example 1 – TensorFlow based model

This example deals with banking-related use case and to write a model to predict the merchants who can default with the bank. The model is deployed on a Linux LPAR that has WML CE installed and the model is served using REST APIs. The model can be accessed from an AIX LPAR for inferencing using REST APIs. This is explained in detail in the following sections.

Use case

This use case is on retaining merchants who are using company network for credit card processing. A client approved many low-value merchant accounts without much scrutiny. Many of those merchant accounts resulted in the defaulters list. Those merchant accounts were focusing on the categories such as cars, furniture, electronics, and so on. The client thinks that they should have put more emphasis on their applicant screening process.

Model building and deployment (on Linux LPAR)

This section explains how the problem mentioned in use case is solved using a machine learning algorithm by writing a model, deploying, and exporting using REST APIs. We choose a TensorFlow-based model to solve this use case by treating this as a classification problem because the merchants need to be categorized in to two discreet categories, that is, default or non-default.

The data used for training can be in RDBMS or in a CSV file. The example provided will have the files listed in Table 1.

Table 1. List of client and server files
File name Purpose
train.py Python code to run training on the data (in cust_history.csv) and save the TensorFlow model
cust_history.csv Training data in CSV format exported from RDBMS
process_data.py Python helper code written to read the data from *.csv and perform data transformation
flask-linux-server.py Python code to load the saved model and export REST APIs to access the model
new_customers.csv New customer data (in CSV format) for which we need predictions
flask-aix-client.py Python code to be run on an AIX LPAR to invoke REST APIs for predicting new customers who can default.
insert_into_aix_db.sh Script to create a new DB2 database and insert new_customer.csv records in to it.

The model can be generated on any PowerLinux LE system with WML CE using the following two steps:

  1. Run the train.py script to do training and generate the TensorFlow model.
    # python train.py
    The output of this script will save the model in the H5 format in the cc_risk_analysis_model.h5 file and the transformed data is saved as feature_transofmed_model.pkl.

  2. Run the flask-linux-server.py script to export REST API using a Flask-based server on Linux LPAR.
    # python flask-linux-server.py <your hostname> <portno>

This command starts a Flask-based server and exposes REST API for the AIX LPAR to access the model.

Model scoring/Inferencing (from AIX)

REST API for inferencing can be called from the AIX LPAR or any other LPAR on which data is stored. In this example, we are assuming that the data is either in the IBM DB2® database or in a CSV file for which we need to run predictions. The data is retrieved on the AIX LPAR and sent to the Flask server running on the Linux LPAR to obtain predictions.

To run this example, the AIX operating system version should be either 7.1 or 7.2 and Python version should be 3.7. The inferencing API from AIX can also be called using the curl command as shown later. You can install the curl and python3 packages on AIX from AIX Toolbox for Linux Applications.

You can run the inferencing from the AIX LPAR using any of the following three methods:

  • A python client script that connects to DB2 database to fetch records and posts requests using REST API to the Flask server running on Linux LPAR for inferencing
  • A python client script that retrieve data from a CSV file and posts requests using REST API to the Flask server running on Linux LPAR for inferencing
  • A curl command to call the REST API for inferencing.

Connect using an AIX Python client for inferencing with data in DB2 database

In this method, we need the IBM DB2 module to be installed to connect to the DB2 database using the Python script.

  1. Install the IBM DB2 Python module.
    # python3 -m pip install ibm_db

  2. Insert data in to DB2 database.
    To insert data from the new_customers.csv file in to DB2 database, a script is provided. It can be run as follows: Login as db2inst1 user.
    # su - db2inst1
    Run the script to insert data.
    # ./insert_into_aix_db.sh
    After running the above script, the records present in the new_customer.csv file will be inserted in to a new database called LOANDB in table NEW_CUSTOMERS.

  3. Run inferencing.
    To run inferencing , run the following script to read data from the DB2 database and send the data to the Linux server to access the model and obtain prediction.
    # python3 flask-aix-client.py -db2 <Linux LPAR Name> <Port>

Connect using the AIX Python client for inferencing with data in the CSV file

The data for which we need to run predictions is in the new_customer.csv file. To run inferencing , run the following script which reads data from the new_customers.csv file and sends the data to the Linux server to access the model and run prediction.

# python3 flask-aix-client.py -csv <Linux LPAR Name> <Port>

The above commands will run predictions on all the data (either from DB2 or CSV). However, if you don’t want to run predictions on all the data but know predictions only for a specific merchant, you can invoke the script as given below:

# python3 flask-aix-client.py <-db2|-csv> <Linux LPAR Name> <Port> <Merchant_Name>

For example, with data in the CSV format, to query if a merchant, Gold Acoustics will default or not, run the following command. This command retrieves the record of the merchant, Gold Acoustics, from new_customer.csv and sends it for inferencing.

#python3 flask-aix-client.py -csv myhostname.ibm.com 5555 "Gold Acoustics"

The above client commands output a prediction value of either 0 or 1. If prediction is 0, it means the merchant might not default and if prediction is 1, the merchant might default.

Connect using curl command

You can also invoke the curl command to access the model through REST API. Refer to the following curl command example to access the model for a sample prediction:

# curl -k -X POST http://myhostname.ibm.com:5555/predict -H 'Cache-Control: no-cache' -H 'Content-Type: application/json' -d '[{"headers":["ACCT_STATUS_K_USD", "CONTRACT_DURATION_MONTH", "HISTORY", "CREDIT_PROGRAM", "AMOUNT_K_USD", "ACCOUNT_TYPE", "ACCT_AGE", "STATE", "IS_URBAN", "IS_XBORDER", "SELF_REPORTED_ASMT", "CO_APPLICANT", "GUARANTOR", "PRESENT_RESIDENT","OWN_REAL_ESTATE", "PROP_UNKN", "ESTABLISHED_MONTH", "OTHER_INSTALL_PLAN", "RENT",  "OWN_RESIDENCE", "NUMBER_CREDITS",  "RFM_SCORE",  "BRANCHES", "TELEPHONE",  "SHIP_INTERNATIONAL"],  "features":["NONE", 12, "CRITICAL ACCOUNT", "EDUCATION", 2096, "up to 100 K USD", "4 to 7 YRS", "NJ", "NO", "YES", "NO", "NO", "NO", "above 4 YRS", "YES", "NO", 49, "NO", "NO", "YES", 1, 2, 2, "NO", "NO"]}]'

In the output of this client command, you can see a prediction value of either 0 or 1.

Example 2 – Scikit-learn based model

This is an example of fraud detection in e-commerce industry by identifying counterfeit retail transactions. Fraud in e-commerce comes in several distinct flavors. Early detection of anomalies in an automated real-time fashion is an important part of e-commerce system.

Use case

This use case shows an example where the customer behavior and purchase history details areused to identify the risk score for each of the retail transaction. This risk score is used to decide if a transaction is safe to be committed.

The model is deployed on a Linux LPAR that has WML CE installed, and the model is served using REST APIs. It can be accessed for inferencing either from AIX before committing a transaction or from a point of sale system that is updating DB2 for the retail transactions as shown in Figure 2.

Figure 2. Detection of fraud retail transactions on AIX using the Scikit-learn model on Linux in a PowerVM virtualized Power Systems server

fig2

Database creation

The data resides on the DB2 database on the AIX system. Perform the following steps to create a database and insert sample retail transactions into it.

  1. Install the IBM DB2 Python module.
    # python3 -m pip install ibm_db

  2. Insert data in to the DB2 database.
    To insert data within the datagen directory in the DB2 database, perform the following tasks: Login as db2inst1 user.
    # su - db2inst1

  3. Run the following script to insert data.
    # ./insert_into_aix_db.sh

After running the above script, data will be inserted in to a new database called retails in the tables: customers, manufacturers, merchants, products, orders, lineitem, merchant_promotion and manufacturer_promotion under schema, testcredit.

Model building and deployment (on Linux LPAR)

This section explains how the problem mentioned in use case (in the previous section) is solved using a machine learning algorithm by writing a model, deploying, and exporting using REST APIs. The Scikit-learn model is used for this use case by treating this as a logistic regression problem because a transaction needs to be given a risk score ranging from 0 to 1.

The data used for training should be in a CSV file. The example provided will have the important files mentioned in Table 2.

Table 2. List of client and server files
File name Purpose
train_model.py Python code to run training on the data (in transaction_20kln.csv) and save the Scikit-learn model
transaction_20kln.csv Training data in the CSV format exported from RDBMS on AIX
riskpredictor.py Python code to load the saved model and export REST APIs to access the model for inferencing
riskserver.py Python code to fetch the customer details from the DB2 database on the AIX server (provided through the environment variable DB2_DSN) and present it to the predictor server (provided through the environment variable CTR_URI) for inferencing.
insert_into_aix_db.sh Script to create a new DB2 database and insert data in the” datagen directory into it.

The model can be generated on any PowerLinux LE system with WML CE using the following steps:

  1. Run the following commands to perform training and generate the Scikit-learn model.

    #cd model
    #python train_model.py
    

    The output of this script saves the model in the pickle serialization format in the files, risk_model.npy and risk_encoder.p.

  2. Install IBM DB2 python module
    # python3 -m pip install ibm_db

  3. Run the predictor server (referred as riskpredictor) script to export the REST API through a Flask-based server on the Linux LPAR.
    # nohup python riskpredictor.py &

    The output of this command starts a Flask-based riskpredictor server and waits for the requests to access the model and run predictions. It uses port number 5001.

  4. Run the main driver (called the zincserver) to export REST API through the Flask-based server on the Linux LPAR. It uses port number 5000.

    # export DB2_DSN="DRIVER={IBM DB2 ODBC DRIVER};DATABASE=retails;HOSTNAME=<IP ADDR of AIX Sever>;PORT=50000;PROTOCOL=TCPIP;UID=db2inst1;PWD=db2inst1;"
    # export CTR_URI="http://<IP ADDR of LINUX Server>:5001/user"
    # nohup python riskserver.py &
    

When requested, it fetches the customer details from the DB2 database on the AIX server (provided through environment variable DB2_DSN) and presents it to the riskpredictor server (provided through the environment variable, CTR_URI) to get the risk score.

Model scoring/Inferencing (from AIX or a Transaction server)

REST API for inferencing can be called either from the AIX LPAR where data is stored or any other LPAR where the transaction is being initiated. Scoring is initiated by providing the transaction ID to the zincserver running on the Linux LPAR, which then retrieves the details of the transaction from the DB2 database on the AIX system and presents to the riskpredictor server to get the risk score.

Run inferencing

Invoke the curl command from the AIX/Linux server to access zincserver on the Linux LPAR through REST API. Refer to the following example curl command to access the model for a sample prediction:

# curl -H "Content-Type: application/json" -d '{"TransID" : "<transaction Id>"}' -X POST http://<IP Addr of Linux System>:5000/fetch_ad

The output of the above command is the risk score for the transaction.

Conclusion

IBM AIX users can take advantage of machine learning models for their data using the method explained in this tutorial. When a model is trained and deployed on a WML CE installed Linux LPAR, the data on AIX can be sent to the Linux LPAR using REST API for inferencing. Existing AIX applications can incorporate REST APIs easily to run online scoring for their data. This can either be a real time scoring or a batch scoring depending on the application requirements. This tutorial helps AIX users to run predictions on their data with a couple of examples using TensorFlow and Scikit-learn based models.

Sanket Rathi
Kavana N Bhat
Phani Kumar Ayyagari