IBM Support

Hive Query support on IOP 4.x cluster for CMX data from BIgInsights 3.x cluster - Hadoop Dev

Technical Blog Post


Abstract

Hive Query support on IOP 4.x cluster for CMX data from BIgInsights 3.x cluster - Hadoop Dev

Body

Overview

CMX is a compression format that was developed by IBM for Hadoop workloads. It can be used for example to compress the data of a Hive table. BigInsights 3.x shipped an earlier version of CMX which used the package name com.ibm.biginsights.compress. In preparation for open-sourcing CMX, the package name changed to org.apache.io.compress in IOP/BigInsights 4.x. Because of this change, CMX data that was written in BigInsights 3.x is not accessible in IOP/BigInsights 4.x without additional configuration. This blog describes the steps required to make the configuration changes.

Steps

      1. Copy the CMX libraries from the BigInsights 3.x cluster into the IOP/BigInsights 4.x cluster

      • Copy file libcmxbiginsights.so from /opt/ibm/biginsights/IHC/lib/native/Linux-amd64-64 to /usr/iop/{iop version}/hadoop/lib/native
      • Copy file ibm-compression.jar from /opt/ibm/biginsights/IHC/lib/ to /usr/iop/{iop version}/hadoop/lib
      2. In IOP/BigInsights 4.x, update the following Hadoop properties in the Ambari UI:

      • Select the HDFS component, click Configs and put the property name io.compression.codecs into the filter field. Add the value com.ibm.biginsights.compress.CmxCodec to the list of comma-separated compression codecs. Save the configuration change.
      • hive-1

      • Select the MapReduce2 component, click Configs and put the the property name mapreduce.application.classpath into the filter field. Add the value /usr/iop/{iop_version}/hadoop/lib/ibm-compression.jar to the colon-separated list of classpath entries. Replace the iop_version place holder with the IOP version, for example for IOP 4.2, the value is /usr/iop/4.2.0.0/hadoop/lib/ibm-compression.jar. Save the configuration change.
      • hive-2

      3. Restart the following components:

      • Hadoop
      • Mapreduce2
      • Yarn
      • Hive

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm16260087