Overview
CMX is a compression format that was developed by IBM for Hadoop workloads. It can be used for example to compress the data of a Hive table. BigInsights 3.x shipped an earlier version of CMX which used the package name com.ibm.biginsights.compress. In preparation for open-sourcing CMX, the package name changed to org.apache.io.compress in IOP/BigInsights 4.x. Because of this change, CMX data that was written in BigInsights 3.x is not accessible in IOP/BigInsights 4.x without additional configuration. This blog describes the steps required to make the configuration changes.
Steps
- Copy file libcmxbiginsights.so from /opt/ibm/biginsights/IHC/lib/native/Linux-amd64-64 to /usr/iop/{iop version}/hadoop/lib/native
- Copy file ibm-compression.jar from /opt/ibm/biginsights/IHC/lib/ to /usr/iop/{iop version}/hadoop/lib
- Select the HDFS component, click Configs and put the property name io.compression.codecs into the filter field. Add the value com.ibm.biginsights.compress.CmxCodec to the list of comma-separated compression codecs. Save the configuration change.
- Select the MapReduce2 component, click Configs and put the the property name mapreduce.application.classpath into the filter field. Add the value /usr/iop/{iop_version}/hadoop/lib/ibm-compression.jar to the colon-separated list of classpath entries. Replace the iop_version place holder with the IOP version, for example for IOP 4.2, the value is /usr/iop/4.2.0.0/hadoop/lib/ibm-compression.jar. Save the configuration change.
- Hadoop
- Mapreduce2
- Yarn
- Hive
1. Copy the CMX libraries from the BigInsights 3.x cluster into the IOP/BigInsights 4.x cluster
- 2. In IOP/BigInsights 4.x, update the following Hadoop properties in the Ambari UI:
- 3. Restart the following components: