Hadoop Cloud administrators can face issues when trying to install new features or libraries system wide without having super privileges and full access to the root file system that the cluster is running on. This will often require contacting support to be able to distribute a simple library that should be a matter of minutes (and not hours or days). The Custom Extensions feature, introduced in IOP 4.2.5, allows cloud admins to easily manage these libraries, and go back to a clean state if necessary with simple changes in configuration. This feature has been implemented in Hadoop, Hive, and HBASE, but as we will see later, other services can leverage this feature if they need to by modifying a simple property.

What is the Custom Extensions feature?

The main concept of custom extensions is create a central place where an user can place libraries and the files placed in this location will be distributed to the nodes that require these libraries. The central location is a directory in HDFS that can be easily be accessed from any node, from which the libraries will be downloaded to a specific location in the local file system (except in the case for HBASE). The directory in HDFS is owned by the cloud admin user, providing some level of control on who can or cannot load extensions into the system.

Roles

The main 2 roles targeted in this blog are:

  • Cloud administrator user: User who has access to the cluster configurations through Ambari.
  • Cluster user: User who has access to the differents services provided by the cluster (read HDFS, submit jobs, etc.)

How to enable and configure Custom Extensions

An HDFS directory will be setup the first time you enable the custom extensions. This directory will remain on the HDFS environment. This will also be the location in which the user places custom extension jars that, for Hadoop and Hive, is replicated to the individual nodes.

Hadoop

In the Ambari UI: HDFS > Config > Advanced > Custom core-site > Add property…:

  1. Add the properties:
    • hadoop.custom-extensions.enabled=<True, False> – This property enables or disables the Custom Extensions feature for Hadoop services.
    • (Optional)hadoop.custom-extensions.owner=<cluster-admin-user> – The user who can add or remove custom libraries, usually the cloud admin user. This user will own the directory in HDFS where the libraries are stored. If not specified, it will default to the hdfs user
    • (Optional)hadoop.custom-extensions.services=<comma, separated, list> – By default it will only load the Custom Extension to the nodes where a YARN component is installed (and will always do if the feature is enabled). You can add additional services names here for which nodes you also desire to load the custom extensions.
  2. Save changes. Restart affected service. First restart the HDFS service and then the rest.

After the restart, the HDFS directory /iop/ext/4.2.5.0/hadoop should exist and owned by the user specified in hadoop.custom-extensions.owner, or the hdfs user if it was not specified. In the Local file system, the directory /usr/iop/current/ext should exist and owned by root in all the nodes where a YARN component or any component from the services specificied in hadoop.custom-extensions.services. An additional directory will be created /usr/iop/current/ext/hadoop when jars are added to hdfs://iop/ext/4.2.5.0/hadoop. If the /usr/iop/current/ext/hadoop directory does not exist, it means that jars has not been added or the Custom Extensions feature has been disable for Hadoop.

Hive

The setup for Hive is really similar to the one described for Hadoop above, instead we are going to replace the name hadoop. with hive. in the properties, and hive does not have a .custom-extensions.services property. Here is how you enable it:

In the Ambari UI: HIVE > Config > Advanced > Custom hive-site > Add property…:

  1. Add the properties:
    • hive.custom-extensions.enabled=<True, False> – This property enables or disables the Custom Extensions feature for the Hive components.
    • (Optional)hive.custom-extensions.owner=<cluster-admin-user> – The user who can add or remove custom libraries, usually the cloud admin user. This user will own the directory in HDFS where the libraries are stored. If not specified, it will default to the hdfs user. This can be set the same or different as hadoop.custom-extensions.owner if the custom extensions features has been enabled for Hadoop.
  2. Save changes. Restart affected service.

The other difference, as you may expect, is that the HDFS directory for Hive Custom Extensions is located at /iop/ext/4.2.5.0/hive, and the nodes which has Hive client or HiveServer2 installed will have a local directory updated to have a directory /usr/iop/current/ext created and the jars will be placed in /usr/iop/current/ext/hive. Again the HDFS directory will be owned by the user specified in hive.custom-extensions.owner, and the local directory will be owned by root.

HBASE

HBASE, has an additional property that we need to modify to make it work correctly with custom extensions. Here is how you do it:

In the Ambari UI: HDFS > Config > Advanced > Custom hbase-site > Add property…:

  1. Add the properties:
    • hbase.custom-extensions.enabled=<True, False> – This property enables or disables the Custom Extensions feature for the HBASE components.
    • hbase.dynamic.jars.dir=/iop/ext/4.2.5.0/hbase – This property lets HBASE components know where the Custom Extensions are stored in HDFS.
    • (Optional)hbase.custom-extensions.owner=<cluster-admin-user> – The user who can add or remove custom libraries, usually the cloud admin user. This user will own the directory in HDFS where the libraries are stored. If not specified, it will default to the hdfs user. This can be set the same or different as hadoop.custom-extensions.owner if the custom extensions features has been enabled for Hadoop
  2. Save changes. Restart affected service.

If you haven’t figure it out yet, the HBASE Custom Extensions are stored in hdfs://iop/ext/4.2.5.0/hbase

How to deploy the Custom Extensions jars?

Deploying the jars is a simple process:

  • Hadoop

    1. Place your jar(s) in hdfs://iop/ext/4.2.5.0/hadoop
    2. Restart the YARN service and the other services specified in hadoop.custom-extesions.services.

    The jars should now be deployed to all the nodes containing a component of the services of the previous step at /usr/iop/current/ext/hadoop. The directory is added to the HADOOP_CLASSPATH, and the yarn.application.classpath and mapreduce.application.classpath properties.

  • Hive

    1. Place your jar(s) in hdfs://iop/ext/4.2.5.0/hive
    2. Restart the HIVE service.

    The jars should now be replicated to /usr/iop/current/ext/hive on all the nodes which has Hive client and/or HiveServer2 installed. This directory is referenced by HIVE_AUX_JAR_PATH, so that restarting the service will allow the jars to be loaded to HiveServer2 and used by Hive client.

  • HBASE

    For HBASE is just as simple as placing the jars in hdfs://iop/ext/4.2.5.0/hbase. HBase custom filters and Phoenix UDF can make use of this custom extension location and will pick up and use the jars automatically.

  • How Custom Extensions work?

    When you enable the Custom Extensions feature and restart the required services; during the start-up the files that are hosted in HDFS will be downloaded to the nodes containing a component of that service. For example, if we enabled the Hadoop Custom
    Extensions, it will by default use YARN as its master service, so every node that contains a NodeManager, ResourceManager, and AppTimelineServer will get a copy of the jar file in their local file system. The same goes for Hive. HBASE works a little bit
    differently by using the jar files directly from HDFS. Hadoop also offers the option to add additional services where to load the custom extensions through the property hadoop.custom-extensions.services (comma-separated list), in the case
    that we need these libraries available as well in nodes where a YARN component is not installed. The figure below shows an example when enabling custom extensions for Hadoop and adding BigSQL as an additional service.

    Custom Extensions Deployment

    What to do when something goes wrong with a Custom Extension?

    Disable the Custom Extensions feature. It is as simple as setting the value of <service>.custom-extensions.enabled=False (deleting the property also works) and pointing hbase.dynamic.jars.dir to its default location
    for HBASE. After the restart, the local folder in the file system containing the jars will be cleared, avoiding loading any custom library to the classpath. Remove the problematic library from HDFS and re-enable the Custom Extensions feature. Every time
    that a Custom Extensions service starts, the custom extensions directory in the local filesystem is cleared, allowing the cluster to start from a clean state if necessary.

    Example using Custom Extensions

    After enabling the extensions and restarting the components, we place the customcompression.jar (this is just a wrapper of the DefaultCodec with a different extension [.customz], it doesn’t have any production value) in hdfs://iop/ext/4.2.5.0/hadoop.
    Modify the properties to use the new codec:

    1. In Advanced core-site append com.ibm.hadoop.io.compress.CustomCodec to the io.compression.codecs property
    2. Enable compression in Advacned mapred-site by ensuring that the value of mapreduce.map.output.compress and mapreduce.output.fileoutputformat.compress is true
    3. Change/Add on Custom mapred-site the properties mapreduce.map.output.compress.codec and mapreduce.output.fileoutputformat.compress.codec to use the codec com.ibm.hadoop.io.compress.CustomCodec
    4. Restart all the affected services. This should include YARN service for the extension to be distributed.
    5. Run a mapreduce job (e.g. TeraSort suite)
    6. In JobHistory UI, in the tasks attemps logs in the syslog section you should see either of this two messages loading the .customz compressor/descompressor:
      1. On the Map task attempt: INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.customz]
      2. On the Reduce task attempt: INFO [fetcher#2] org.apache.hadoop.io.compress.CodecPool: Got brand-new decompressor [.customz]

    Now all the jobs run in the cluster will use the custom compressions by default. We could just add the library and not set it as a default codec, and still users will have the option of using the codec if they desire to when submitting jobs.

10 comments on"Custom Extensions in IOP 4.2.5 – Hadoop, Hive, and HBASE"

  1. Nice Blogs!!!
    Big Data and its importance to companies
    http://prwatech.in/big-data-hadoop-training-in-pune/

  2. Hi, one question how to download IOP 4.2.5?

  3. wow great information if someone is looking for hadoop course check this site as well
    https://www.sevenmentor.com/training/big-data-hadoop-training-institute-in-pune.php

  4. Wow nice blog with great information…!
    if someone is looking for hadoop course visit https://www.etlhive.com/bigdata-hadoop-training-in-pune/

  5. Its great learninng with ETLhive for Hadoop traning.
    https://www.etlhive.com/

  6. Nice Blog!!
    if someone looking for Course EtlHive Biggest in Bigdata hadoop & Cloud Computing in Pune and Mumbai.

  7. Great information!!
    Biggest in Bigdata hadoop & Cloud Computing in Pune and Mumbai.
    Visit- https://www.etlhive.com/salesforce-training-in-pune

  8. Very informative article! Thanks for this
    Big data with IoT technology is the best combination for industrial iot applications.
    Visit: https://www.hiotron.com/iot-training/

  9. Very informative article! Thanks for this awesome content regarding Hadoop.

  10. Very nice Blog.
    Enjoyed reading the article above, really explains everything in detail, the article is very interesting and effective. Thank you and good luck for the upcoming articles.

Join The Discussion

Your email address will not be published. Required fields are marked *