Hive Query support on IOP 4.x cluster for CMX data from BIgInsights 3.x cluster

Overview

CMX is a compression format that was developed by IBM for Hadoop workloads. It can be used for example to compress the data of a Hive table. BigInsights 3.x shipped an earlier version of CMX which used the package name com.ibm.biginsights.compress. In preparation for open-sourcing CMX, the package name changed to org.apache.io.compress in IOP/BigInsights 4.x. Because of this change, CMX data that was written in BigInsights 3.x is not accessible in IOP/BigInsights 4.x without additional configuration. This blog describes the steps required to make the configuration changes.

Steps

1. Copy the CMX libraries from the BigInsights 3.x cluster into the IOP/BigInsights 4.x cluster

Copy file libcmxbiginsights.so from /opt/ibm/biginsights/IHC/lib/native/Linux-amd64-64 to /usr/iop/{iop version}/hadoop/lib/native
Copy file ibm-compression.jar from /opt/ibm/biginsights/IHC/lib/ to /usr/iop/{iop version}/hadoop/lib

Select the HDFS component, click Configs and put the property name io.compression.codecs into the filter field. Add the value com.ibm.biginsights.compress.CmxCodec to the list of comma-separated compression codecs. Save the configuration change.

Select the MapReduce2 component, click Configs and put the the property name mapreduce.application.classpath into the filter field. Add the value /usr/iop/{iop_version}/hadoop/lib/ibm-compression.jar to the colon-separated list of classpath entries. Replace the iop_version place holder with the IOP version, for example for IOP 4.2, the value is /usr/iop/4.2.0.0/hadoop/lib/ibm-compression.jar. Save the configuration change.

Hadoop
Mapreduce2
Yarn
Hive

Tips

Hive Query support on IOP 4.x cluster for CMX data from BIgInsights 3.x cluster - Hadoop Dev

Technical Blog Post

Abstract

Body

Overview

Steps

UID

Share your feedback

Need support?