IBM BigIntegrate (InfoSphere Information Server on Hadoop) provides tools that you can use to transform and cleanse big data by using the resource management capabilities of Hadoop to run jobs on the Hadoop cluster. These installation instructions are specific to the BigIntegrate installation and provide a detailed path for successfully installing Version 11.7 or 11.5 of the product.

To run IBM BigIntegrate, which is also known as Information Server on Hadoop, you must configure your Hadoop environment, install InfoSphere Information Server on a Hadoop cluster, and configure your installation to work with Hadoop.

The diagram below shows how the Hadoop infrastructure processes a job and provides an overview of how everything should work after BigIntegrate is installed and configured:

The following steps in the diagram show how jobs are processed on Hadoop:
Step 1: The conductor process manages the section leader and player processes that run on the InfoSphere Information Server engine. The conductor process on the engine tier receives a job run request for an InfoSphere DataStage job.
Step 2: The conductor connects to the YARN client, which assigns an Application Master to the job from the available pool of Application Masters it maintains.
Step 3: The Application Master requests resources from the YARN resource manager. The job processes run in a YARN container, with each container running a section leader and players. The YARN container designates resource requirements such as CPU and memory. When the resources are allocated, the conductor sends the process commands for the section leader to the Application Master, which starts those commands on the allocated resources.

System requirements for Information Server
Linux requirements

  • Your system must meet the following requirements:
    Operating system Minimum version Hardware
    Red Hat Enterprise Linux Server release 7 7 x86-64
    SUSE Linux Enterprise Server release 11 11 x86-64
    Red Hat Enterprise Linux Server release 6 (supported only for Information Server Version 11.5) 6.6 x86-64
  • Your system must have Java 7.1.3 or later installed
  • You must have the following packages downloaded if you are using a 64-bit Red hat Enterprise Linux Server 7:
    glibc-2.17-55.el7.x86_64
    libXp-1.0.2-2.1.el7.x86_64
    libXau-1.0.8-2.1.el7.x86_64
    libXext-1.3.2-2.1.el7.x86_64
    libX11-1.6.0-2.1.el7.x86_64
    libxcb-1.9-5.el7.x86_64
    libXmu-1.1.1-5.1.el7.x86_64
    nss-softokn-freebl-3.15.4-2.el7.x86_64
  • You must have a supported Linux tool installed: netstat, ed, ipcs, sed,grep, fgrep
  • Your system must have a supported version of Apache Hadoop installed: Hortonworks Data Platform Version 2 or Cloudera CDH 5.5 or later

Detailed system requirements for Information Server, Version 11.7
Detailed system requirements for Information Server, Version 11.5

Windows requirements
For the Information Server client on Windows, you must have Microsoft Windows Version 7 or later installed.

Downloading the installation images
Install the following three installation images from Passport Advantage:
For Version 11.7:

  • IBM InfoSphere Information Server V11.7 on Linux
  • IBM InfoSphere Information Server V11.7 Windows Client Multilingual
  • IBM InfoSphere Information Server V11.7 Bundle Spec File Multiplatform Multilingual

For Version 11.5:

  • IBM InfoSphere Information Server V11.5.0.2 on Linux
  • IBM InfoSphere Information Server V11.5.0.2 Windows Client Multilingual
  • IBM InfoSphere Information Server V11.5 Bundle Spec File Multiplatform Multilingual
Installing Information Server on a Linux computer
Install the Information Server repository, engine, and services tiers on a Linux computer. The instructions below are for Version 11.7, but also apply to Version 11.5.

  1. Download and unzip the installation and license files to the Linux machine where you will install the Information Server repository, engine, and service tiers. For example, for 11.7:
    • Installation Image: IS_V11.7_LINUX_X86_64_MULTI.tar.gz
    • License files: IS_V11.7_BUNDLE_SPEC_FILE_MULTI.zip
  2. Copy the image.properties file and license folder to the is-suite folder. Note: This file and folder were unzipped from the IS_V11.5_BUNDLE_SPEC_FILE_MULTI.zip file for version 11.5 or the IS_V11.7_BUNDLE_SPEC_FILE_MULTI.zip file for version 11.7.
  3. Verify that the dsadm user can access the is-suite directory by moving the is-suite folder to the /is-suite directory and issuing the following command:
    • chmod -R 777 /is-suite
  4. Issue the following command to obtain the URL to start the installation wizard:
    # cd /is-suite
    # ./setup -verbose

    You will see the following message:
    ======> Enter the following URL to your web browser to begin the installation process: https://somehostname:8446/ISInstall
  5. Enter the URL that is generated into a browser and follow the instructions in the installation wizard. Note: These installation instructions install a basic client-server configuration which means that the repository, engine, and services tiers are all installed on a single Linux machine.
  6. Select all of the default settings in the wizard until you get to the Tier Selection panel. On this panel, select Repository, Services, and Engine.
  7. Continue to take the default settings in the wizard until you get to the Application Server Options panel. Select Install WebSphere Application Server Liberty Core:

    For information about installing other application server software, see Options for installing the application server software.
  8. On the WebSphere Application Server Liberty Core Installation panel, enter 9443 as the HTTPS port number. If the port is taken, set up the HTTPS to port to another port. For example, 9446. Do not change the key password.
  9. On the IBM Db2 Instance User panel, select Create a user as an instance owner if you do not have an existing user account.

    For information about using an existing Db2 database, see Options for installing the database software.
  10. On the Db2 Fenced User Information panel, select Create a new user as a fenced user if you do not have an existing user account.
    Select "Create new fenced user"
  11. Continue to take the default settings in the wizard until you get to the Metadata Repository Configuration panel. Enter xmeta as the database owner, xmeta as the database name, db2inst1 as the database instance name, and /opt/IBM/InformationServer/Repos/xmeta for the database location.
  12. On the Staging Area Configuration panel, specify xmetasr as the database owner, enter a new password, specify xmeta for the database name, specify db2inst1 as the database instance name, and /opt/IBM/InformationServer/Repos/xmeta for the database location.
  13. Continue to take the default settings in the wizard until you get to the RPC Port Configuration and ITAG Configuration for Multiple Engine Instances on This Computer panel. Verify that Configure ITAG for this engine tier instance is NOT selected.
  14. On the IBM InfoSphere DataStage Administrator panel, if you do not have an existing DataStage Administrator, specify dsadm as the IBM InfoSphere DataStage administrator, enter a new password, specify dstage as the group name, and /home/dsadm as the Home directory.
  15. Continue to take the default settings until you get to the Operations Database Configuration panel. Specify dsodb as the database owner, specify a password, enter xmeta as the database name, enter db2inst1 as the database instance name, and specify /opt/IBM/InformationServer/Repos/xmeta as the database location.
  16. On the IBM InfoSphere Information Analyzer Repository Configuration panel, enter iauser as the database owner, specify a password, enter iadb as the database name, db2inst1 as the database instance, and /opt/IBM/InformationServer/Repos/iadb for the database location.
  17. On the Standardization Rules Designer Database Configuration panel, enter srduser as the database owner, specify a password, enter xmeta as the database name, db2inst1 as the database instance, and /opt/IBM/InformationServer/Repos/xmeta for the database location.
  18. On the Information Server Enterprise Search (ISES) panel, select the Skip the Information Server Enterprise Search Install checkbox. Note: Skip this step if you are installing Version 11.5.
  19. Continue to take the default settings until you get to the Preinstallation Summary for the Current Computer panel. Click Install.

  20. Click Finish after the installation completes.
Installing the Information Server client on a Windows computer
To install the Information Server client on a Windows computer:

  1. Download and unzip the installation files to the Windows machine where you will install the Information Server repository, engine, and service tiers.
  2. Copy the is-client directory to the c:\Users\Administrator\is-client directory.
  3. Run the setup.exe program to start the installation. A URL to the web based installation program is launched. Note: If you changed the default port number of Information Server from the default 9443 to another port number during the installation of the other tiers, verify that you enter the correct port number during this installation.
  4. Continue to take the default settings in the wizard until you get to the Metadata Interchange Agent Ports Configuration panel. Enter 19443 for the metadata interchange agent HTTPS port number, enter the server address for the services tier computer, 9446 for the port number, and the user name and password of the suite administrator.
  5. Accept the request to download the certificate after the Metadata Interchange Agents Ports Configuration screen. This request will appear only if you have WebSphere Application Server Liberty Core installed.
  6. Continue to take the default settings until you get to the Preinstallation Summary for the Current Computer panel. Click Install.
  7. Click Finish after the installation completes. You will see a list like the following:
  8. Set up a C++ compiler in order to run InfoSphere DataStage jobs. For more information, see Setting up a C++ compiler.
  9. Create an InfoSphere DataStage project by using the Administrator Client or open the default project (dstage1). For more information, see Adding projects.
  10. Verify that your installation was successful by compiling and running a simple InfoSphere DataStage job that contains a transformer stage to ensure that the compiler has been set up correctly. For information about creating your first job, see Designing your first job.
Installing the Hadoop cluster
Install Ambari and the Hortonworks Data Platform 2.6 through Ambari by following the Apache Ambari Installation instructions.

  • Install Ambari
  • Install Hortonworks Data Platform 2.6 through Ambari

Adding the Information Server host as a Hadoop edge node
Use Apache Ambari to add the Information Server installation as an edge node to your Hadoop cluster.

  1. Log into your Hadoop cluster from the Ambari user interface. Browse to the Hosts tab and select Add New Hosts from the Actions menu.
  2. On the Install Options panel, specify the Information Server installation host in the target hosts field. Copy and paste the content of the Information Server installation host’s private key (~/.ssh/id_rsa) into the SSH Private Key field. Click Register and Confirm.
  3. Confirm that the correct host will be added to the Hadoop cluster and then click Next.
  4. Resolve any errors that are discovered during the host check. For example, in the image below, Ambari detected that Information Server and the Java SE Development Kit (JDK) were running on the host that was being added as the edge node. You can continue with warnings, but you must resolve any errors.
  5. On the Assign Slaves and Clients panel, select only the Client checkbox. Click Next.
  6. On the Configurations panel, keep all the defaults and click Next.
  7. Review the final configuration settings and click Deploy.
  8. Click Next when the button is enabled. The Next button is enabled after the client components are installed on the Hadoop edge node.
    The status of the installation.
  9. Click Complete after the edge node is successfully added.
Setting up Information Server users with Hadoop user permissions
You must grant Information Server users Hadoop user permissions to run jobs on Hadoop. Also, you must grant Hadoop users Information Server permissions.

  1. Issue the following command on the Hadoop edge node (the node where Information Server is installed):
    usermod –a –G dstage yarn
  2. Issue the following commands on all HDFS data nodes:
    groupadd dstage
    useradd -g dstage dsadm
    usermod –a –G dstage yarn
Setting up directories with permissions to run jobs
When the YARN user runs job processes on Hadoop, that user must be granted additional permissions to access directories and files. For example, job processes try to access the scratch disk that is mentioned in the PX configuration files, which might have been created and owned by the dsadm user. If the YARN user does not have the required permissions, the access attempt fails and the job fails. To avoid job failures, you must add the YARN user to the group that includes dsadm, which is usually named dstage, and grant the required permissions to that group.

  1. Set up the following directories on the Hadoop edge node (the node where Information Server is installed) and all HDFS data nodes by issuing the following commands:
    • To set up the scratch directory:
      mkdir –p scratch_dir
      chmod –R 777 scratch_dir or chmod –R 775 scratch_dir

      For example:
      mkdir –p /tmp/sd
      chmod –R 777 /tmp/sd or chmod –R 775 /tmp/sd

      This was originally on the edge node:

      root@hdisserver1 ~]# ls -lad /tmp/sd*
      drwxr-xr-x 2 dsadm dstage 6 Jan 26 13:08 /tmp/sd

    • To set up the resource directory:
      mkdir –p resource_dir
      chmod –R 777 resource_dir or chmod –R 775 resource_dir

      For example:
      mkdir –p /tmp/rd
      chmod –R 777 /tmp/rd or chmod –R 775 /tmp/rd

      For more information about the resource directory, see Configuration files for InfoSphere Information Server on Hadoop.

  2. Set up the Information Server installation directory on HDFS data nodes by issuing the following commands:
    mkdir –p /opt/IBM/InformationServer
    chmod –R 777 /opt/IBM/InformationServer
  3. Set up the following directories from any Hadoop cluster node by issuing the following commands:
    • To set up the resource disk on HDFS:
      hdfs dfs –mkdir –p resource_dir

      For example:
      hdfs dfs –mkdir –p /tmp/rd
      hdfs dfs -chmod –R 777 /tmp/rd

    • To set up the home directories on HDFS for the YARN and InfoSphere DataStage administrator (dsadm) users:

      sudo –u hdfs hdfs dfs –mkdir /user/dsadm
      sudo –u hdfs hdfs dfs –chown dsadm:dstage /user/dsadm
      sudo –u hdfs hdfs dfs –chmod –R 775 /user/dsadm

      sudo –u hdfs hdfs dfs –mkdir /user/yarn
      sudo –u hdfs hdfs dfs –chown yarn:dstage /user/yarn
      sudo –u hdfs hdfs dfs –chmod –R 775 /user/yarn

Setting up environment variables and configuration parameters
When you install Information Server on Hadoop, environment variables are set to default values when the data nodes are set up. The default settings in the yarnconfig.cfg file work for most instances of Information Server running on Hadoop. However, you might want to modify the specific environment variables for your particular instance or Hadoop or at the InfoSphere DataStage job or project level.

    Copy the default configuration file by issuing the following command:
    cp /opt/IBM/InformationServer/PXEngine/etc/yarn_conf/yarnconfig.cfg.default
    /opt/IBM/InformationServer/Server/DSEngine/yarnconfig.cfg
  1. Set the APT_YARN_CONFIG variable to the location of the yarnconfig.cfg file. For example,
    /opt/IBM/InformationServer/Server/DSEngine/yarnconfig.cfg. Set this at the job, project, or DataStage environment file (dsenv) level. For more information, see the Knowledge Center or the the IBM InfoSphere Information Server on Hadoop Deployment and Configuration Guide.
  2. Set the following key environment variables:
    APT_YARN_MODE=true
    APT_YARN_CONTAINER_SIZE= 64
    APT_YARN_AM_POOL_SIZE=2
    APT_YARN_MULTIPLE_USERS=false
    APT_YARN_BINARY_COPY_MODE=hdfs
  3. Edit the LD_LIBRARY_PATH in the DataStage environment file, which can be found in the /opt/IBM/InformationServer/Server/DSEngine/dsenv directory. Add the path of the Hadoop native library to the LD_LIBRARY_PATH environment variable:
    LD_LIBRARY_PATH=/pathToHadoop/hadoop/lib/native/:$LD_LIBRARY_PATH

    For example:
    LD_LIBRARY_PATH=`dirname $DSHOME`/biginsights/IHC/c++/Linux-amd64-64/lib:`dirname $DSHOME`/branded_odbc/lib:`dirname $DSHOME`/DSComponents/lib:`dirname $DSHOME`/DSComponents/bin:$DSHOME/lib:$DSHOME/uvdlls:`dirname $DSHOME`/PXEngine/lib:$ISHOME/jdk/jre/lib/amd64/j9vm:$ISHOME/jdk/jre/lib/amd64:$ASBHOME/lib/cpp:$ASBHOME/apps/proxy/cpp/linux-all-x86_64:/opt/IBM/DB2/lib64:$LD_LIBRARY_PATH
    LD_LIBRARY_PATH=/usr/hdp/2.6.3.0-235/hadoop/lib/native:$LD_LIBRARY_PATH
    export LD_LIBRARY_PATH

  4. Set the APT_CONFIG_FILE variable at the job, project, or DataStage environment file (dsenv) level. This variable points to a configuration file that is used by the InfoSphere DataStage parallel engine to determine the nodes and resources that will be used for the IBM InfoSphere DataStage job execution.

    For example, issue the following command:
    APT_CONFIG_FILE=/opt/IBM/InformationServer/Server/Configurations/px_config.apt
    # cat /opt/IBM/InformationServer/Server/Configurations/px_config_dynamic.apt
    {
    node "hdisserver1"
    {
    fastname "hdisserver1.fyre.ibm.com"
    pools ""
    resource disk "/tmp/rd" {pools ""}
    resource scratchdisk "/tmp/sd" {pools ""}
    }
    node "node1"
    {
    fastname "$host"
    pools ""
    resource disk "/tmp/rd" {pools ""}
    resource scratchdisk "/tmp/sd" {pools ""}
    instances 2
    }
    }

Validate the Information Server on Hadoop installation
Run a simple InfoSphere DataStage job from the DataStage Designer client or from the command line to verify that the installation and configuration was successful.

  1. From the DataStage Designer, validate that Information Server on Hadoop is configured properly by running a simple job consisting of row generator, transformer, and peek operators.
  2. From the command line, on the Hadoop edge node where Information Server is installed, run the following command as a root user to force PX YARN Client Binary localization and to set up the OSH path:
    cd `cat /.dshome`
    cd ../..
    echo '" >> Version.xml
    export
    PATH=$APT_ORCHHOME/bin:$APT_ORCHHOME/osh_wrappers:$APT_ORCHOME/etc:.:$PATH
    For more information, see Forcing binary localization on all nodes in a large Hadoop cluster
  3. Run a simple OSH script job as the DataStage administrator user (dsadm):
    su – dsadm
    cd `cat /.dshome`
    . ./dsenv
    cd $APT_ORCHHOME/etc/yarn_conf
    ./stop-pxyarn.sh
    ./start-pxyarn.sh
    osh "generator -records 10 -schema record(a:int32;b:date) | peek"
Troubleshooting
Below are the most frequently asked questions or issues that are encountered when installing Information Server on Hadoop.

  • The number of file descriptors is too low.
    Issue: You receive the following error as soon as you start to install root@hdisserver1 is-suite]# ./setup -verbose:
    The number of open file descriptors is too low. Both installation and usage of Information Server require that the number of open file descriptors be 10240 or higher.

    Solution: Check the current limit:
    root@hdisserver1 is-suite]# ulimit -n
    1024

    Add the following lines in /etc/security/limits.conf:
    * soft nofile 10240
    * hard no file 10240

    Log out of the program and then log in again:
    # ulimit -n
    10240

  • The package libXp-1.0.2-2.1.el7.x86_64 is missing.
    Issue: You receive the following error message during the Requirements Check during the Information Server installation:
    FAILED: CDIPR2112I: Ensure that the required package libXp-1.0.2-2.1.el7.x86_64 is installed.

    Solution: Install the missing package by issuing the following command:
    # yum install libXp-1.0.2-2.1.el7.x86_64

    Verify that the package has been installed:
    # rpm -qa | grep libXp-1.0.2-2.1.el7.x86_64
    libXp-1.0.2-2.1.el7.x86_64

  • There is an error regarding tools being missing.
    Issue: You receive the following error during the installation of Information Server on a bare metal host because the tool ed was not installed:
    The tools that are required by the IBM InfoSphere Information Server engine must be present. Required tools are ed, ipcs, sed, grep, fgrep.

    Solution: Install the ed tool by issuing the following command:
    # yum install ed

  • There is an installation error because netstat (network statistics) was not installed.
    Issue: You receive this error message when you install Information Server in a bare metal environment:
    Installation program encountered an unexpected error.
    com.ibm.is.install.exception.InstallException: CheckDataStageClientConnections.connectionExistsNow() failed trying to execute netstat.

    Solution: Install netstat by issuing the following command:
    # yum install net-tools

For information about other installation scenarios, see the documentation in the Knowledge Center or refer to the IBM InfoSphere Information Server on Hadoop Deployment and Configuration Guide.

1 comment on"Installing IBM BigIntegrate"

  1. Ramakrishna March 12, 2018

    Hi Kristin,
    Good document. Cleared some of my doubts on IBM IIS integration with hotonwork. Thanks for your contribution

Join The Discussion

Your email address will not be published. Required fields are marked *