Overview

This document discusses the customization requirements that you might get while installing IOP and BigInsights in a new environment on prem. These requirements come up a lot of times because security and infrastructure team wants to make sure that IOP complies with their enterprise policies and can be integrated easily.

  1. Customized User ID’s for Hadoop service users such as change ‘hdfs’ user to ‘custhdfs’
  2. Install the product in an organization with another hadoop distribution such as Cloudera or Hortonworks and you need to use the same hadoop service-accounts.
  3. Root folder ‘/’ does not have 100GB free space, instead separate mounts are created for installation locations.
  4. Use a single RDBMS to manage Ambari metadata, Hive metadata and Oozie metadata.

It is not necessary that your scenario may have all these customization, hence this document will discuss the workaround/resolution for each customization. We will also discuss the implication of such a customization on the environment and other components. After reading this document, the reader will be well prepared to tackle the customization properly.

Customization

1. Customized User accounts for Hadoop service users. For example :’hdfs’ user to ‘custhdfs’

To install the hadoop distribution with customized user accounts, it is important to be cognizant of all the users, groups and user:group mapping required. This information can be found here
You can customize usernames and group during installation in the MISC tab as shown in the screenshot below.
custom-users
Please make sure if the environment is LDAP enabled then all the customized user-accounts should exist with correct user:group mapping at OS level before you are installing the software on all the servers.

Implication

  • If you are using customized username and group then make sure that the length of customized username and group is not more than 8 characters. This can cause an issue while installing BigSQL because the BigSQL instance owner user:group cannot be more than 8 characters.
  • During BigSQL installation, it is encourage to run a pre-checker script. This script has few user-accounts hardcoded such as sqoop and hcat. You will need to edit the pre-checker on each node to edit such users to your customized users.
  • While starting/Stopping Knox from Ambari, the permissions for Knox folder at OS may change back to default user:group which can cause problem starting the service. Make sure that while installing you check mark “skip group modification during install”
  • 2.Install the product in an organization with another hadoop distribution such as Cloudera or Hortonworks and you need to use the same hadoop service-accounts.

    If the requirement is to install IOP and BigInsights in an organization that has another LDAP integrated Hadoop environment, then the challenge you will face is that the service users accounts may already exist but the mapping of these users to corresponding groups can be different. It is also possible that their are few extra service-users than the required service-users for IOP. The specific comparison can be seen for an environment with an existing Cloudera distribution in the screenshot:
    hadoop_primary_group
    The workaround for such a situation is to add the required group for IOP as a secondary group to the user accounts. For example:

    [root@hadoopmachine1~]# id hdfs
    uid=1205(hdfs) gid=20(games) groups=20(games),1201(hadoop)

    After adding IOP specific groups to the users at OS level, the installation should go smoothly.

    3. Root folder ‘/’ does not have 100GB free space, instead separate mounts are created for installation locations.

    If the infrastructure team raises issue to give 100GB for ‘/’ root folder due to any concern, the work-around is to ask the infrastructure team to create separate mounts with all the required folders for product installation. The primary folders where product installs are

    /usr/iop – IOP components
    /usr/ibmpacks – BigInsights

    Implication

  • While installation of BigSQL, a multi-gb rpm is downloaded to ‘/’ folder which is deleted after installation.However, in case separate mounts were created for product installation, ‘/’ root folder might only have 2GB-5GB space including OS. This will cause BigSQL installation failure since RPM will not be able to download and unpack in root folder. So, make sure even if separate mounts were created for product installation, root folder still have 15GB space to get past this issue.
  • During installation, Ambari’s smart scripts find all the separate mounts which can be used for a distributed installation. For example, yarn-logs or kafka-logs might get populated with ‘/usr/iop’ location. Make sure you notice the locations where product will install different components.
  • 4. Use a single RDBMS to manage Ambari metadata, Hive metadata and Oozie metadata.

    If the requirement is to setup and use one instance an RDBMS which can be configured in hadoop environment, then it should be addressed before installing the product. There is a step-by-step guide to setup an instance of PostgreSQL and MySQL available at below links. They have detailed layout with scripts and screenshots to enable the use of single instance of either RDBMS to be used by the entire cluster for metadata management.
    Use PostgreSQL instance
    Use MySQL instance

    Final Recommendation

    While I have tested these approaches successfully and made sure the cluster is healthy and performing as expected with such customizations, it is still encouraged to install IOP/BigInsights as per recommended settings. Hadoop environment is a highly distributed environment which works properly when multiple components are working in proper orchestration. So, dont customize the hadoop environment without an expert.

    I hope this article helps you and makes your life easier. Please leave your feedback.

    Join The Discussion

    Your email address will not be published. Required fields are marked *