Apache Hadoop 3.1.1 is supported with HDFS Transparency 3.1.0-2. When you use Apache Hadoop, the configuration files of HDFS Transparency are located under /var/mmfs/hadoop/etc/hadoop. By default, the logs of HDFS Transparency are located under /var/log/transparency/.
If you want to configure Apache Hadoop 3.1.1 with HDFS Transparency 3.1.0-2, execute the following steps:
1 Set ulimit nofile to 64K on all the nodes.
2 Set up the ntp server to synchronize the time on all nodes.
3 Root password-less access from NameNodes to all other DataNodes.
4 Install HDFS Transparency 3.1.0-2 on all HDFS Transparency nodes
ssh to TransparencyNode1.
5 Update the /var/mmfs/hadoop/etc/hadoop/core-site.xml with your NameNode hostname.
6 Update the /var/mmfs/hadoop/etc/hadoop/hdfs-site.xml according to your configuration:
7 Update the /var/mmfs/hadoop/etc/hadoop/gpfs-site.xml for the gpfs.mnt.dir, gpfs.data.dir and gpfs.storage.type configurations.
8 Update the /var/mmfs/hadoop/etc/hadoop/hadoop-env.sh and change export JAVA_HOME= into export JAVA_HOME=
9 Update the /var/mmfs/hadoop/etc/hadoop/workers to add DataNodes. One DataNode hostname per line.
10 Synchronize all these changes into other DataNodes by executing the following command:
/usr/lpp/mmfs/bin/mmhadoopctl connector syncconf /var/mmfs/hadoop/etc/hadoop
11 Start HDFS Transparency by executing mmhadoopctl:
/usr/lpp/mmfs/bin/mmhadoopctl connector start
12 Check the service status of HDFS Transparency by executing mmhadoopctl:
/usr/lpp/mmfs/bin/mmhadoopctl connector getstate
Note: If HDFS Transparency is not up on some nodes, login to those nodes and check the logs located under /var/log/transparency. If you do not get any errors, HDFS Transparency should be up by now.
13 If you want to configure the Yarn, execute the following steps:
Download Apache Hadoop 3.1.1 from Apache Hadoop website.
Unzip the packages to /opt/Hadoop-3.1.1 on HadoopNode1.
Login to HadoopNode1.
Copy the hadoop-env.sh, hdfs-site.xml,core-site.xml , and workers from /var/mmfs/hadoop/etc/hadoop on HDFS Transparency node to HadoopNode1:/opt/hadoop-3.1.1/etc/hadoop/.
Copy /usr/lpp/mmfs/hadoop/template/mapred-site.xml.template and /usr/lpp/mmfs/hadoop/template/yarn-site.xml.template from HDFS Transparency node into HadoopNode1:/opt/hadoop-3.1.1/etc/hadoop as mapred-site.xml and yarn-site.xml.
Update /opt/hadoop-3.1.1/etc/hadoop/mapred-site.xml with the correct path location for yarn.app.mapreduce.am.env, mapreduce.map.env, and mapreduce.reduce.env configurations.
For example, change the value from HADOOP_MAPRED_HOME=/opt/hadoop-3.1.1 to HADOOP_MAPRED_HOME=/opt/hadoop-3.1.1
Note: /opt/hadoop-3.1.1 is the real location for Hadoop.
Update /opt/hadoop-3.1.1/etc/hadoop/yarn-site.xml. Especially configuring the correct hostname for yarn.resourcemanager.hostname.
Synchronize /opt/hadoop-3.1.1 from HadoopNode1 to all other Hadoop nodes and keep the same location for all hosts.
On the Resource Manager node, run the following command to start the Yarn service:
Note: By default, the logs for Yarn service will be under /opt/hadoop-3.1.1/logs. If you plan to start Yarn services with other user name, you could change the user root in the above command to your target user name.
Run the following command to submit word count jobs:
#/opt/hadoop-3.1.1/bin/hadoop dfs -put /etc/passwd /passwd
wordcount /passwd /results