IBM Spectrum Scale HDFS Transparency implementation integrates both the NameNodes and the DataNodes services and responds to the request as if it were HDFS on IBM Spectrum Scale file system.

Starting from HDFS Transparency version 3.1.1 and IBM Spectrum Scale version 5.0.4.2, HDFS Transparency is integrated with both the IBM Spectrum Scale installation toolkit and Cluster Export Services (CES).

Integration advantages

  • Ability to quickly configure an IBM Spectrum Scale HDFS Transparency cluster to connect to an existing centralized storage in shared mode (where the HDFS Transparency NameNodes and DataNodes are part of the same GPFS cluster as the centralized storage) using the IBM installation toolkit.
  • HDFS Transparency 3.1.1 uses IBM Spectrum Scale Cluster Export Services (CES) to manage and configure NameNodes state and configurations. All protocols use the mmces commands.
  • Separation of compute and storage model for easy deployment and maintenance.
  • A HDFS Transparency cluster can also be used for Hadoop storage tiering. See Hadoop Storage Tiering mode without native HDFS federation for more information.


Figure 1. CES HDFS single HDFS configuration layout in single GPFS cluster in shared mode

Limitations

  • Red Hat Enterprise Linux is supported.
  • CES HDFS is not supported for Cloudera® distributions.
  • CES HDFS supports Open Source Apache Hadoop.
  • mmhadoopctl command is used for HDFS Transparency 3.1.0 and below.
  • mmhdfs command is used for HDFS Transparency 3.1.1 for CES HDFS management.
  • Support SAN based shared storage. ESS currently not supported through the installer toolkit.

See Support Matrix and Limitations and Recommendations sections of the IBM Spectrum Scale Big Data and Analytics support documentation within the IBM Knowledge Center.

Sample configuration used for this example

Centralized cluster node (Admin node): c902f05x10.gpfs.net

NameNodes: c902f08x01.gpfs.net, c902f08x03.gpfs.net
DataNodes: c902f08x04.gpfs.net, c902f08x13.gpfs.net, c902f08x14.gpfs.net, c902f08x15.gpfs.net

Installer node ip address: 172.16.1.125
CES Public IPs: 172.16.2.80,172.16.2.84
Note: For more information on CES Public IPs assignment and setup , see CES IP aliasing to network adapters on protocol nodes section in IBM Spectrum Scale documentation in the IBM Knowledge Center.

Installation steps
This example gives sample instructions on how to deploy a CES HDFS cluster onto an existing centralized storage that is up and running.

  1. Ensure the steps for HDFS Transparency pre-req setup are done. For example, on all HDFS Transparency nodes:
    • Install base packages required
      yum -y install kernel-devel cpp gcc gcc-c++ binutils make net-tools java-1.8.0-openjdk* bind-utils
    • Set Java path
      vi /root/.bashrc

      export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.222.b03-1.el7.x86_64
      export PATH=$PATH:$JAVA_HOME/bin

    • Edit /etc/security/limits.conf limits
      vi /etc/security/limits.conf
      * soft nofile 65536
      * hard nofile 65536
      * soft nproc 65536
      * hard nproc 65536
    • Ensure the CES Shared Root file system is created and available
  2. On the installer node, download and extract the IBM Spectrum Scale installer bin and accept the license.
    ./Spectrum_Scale_Advanced-5.0.4.2-x86_64-Linux-install
  3. On the installer node, cd to where the IBM Spectrum Scale installer resides, default installs to the following directory:
    cd /usr/lpp/mmfs/5.0.4.2/installer
  4. On the installer node, run the following commands to create the CES HDFS cluster to use the centralized storage:
    # Configure the installer node using its IP address
    ./spectrumscale setup -s 172.16.1.125

    # Discover and populate the existing cluster configuration
    ./spectrumscale config populate –node c902f05x10.gpfs.net


    Installer will keep backup of existing clusterdefinition.txt file in /usr/lpp/mmfs/5.0.4.2/installer/configuration path and populate a new one. Do you want to continue [Y/n]: y

    Do you want to provide IP addresses for NTP [Y/n]: n

    Note: This is because NTP setup is not supported for adding nodes to an existing cluster.

    # Add HDFS Transparency NameNodes as protocol nodes (-p)
    ./spectrumscale node add c902f08x01.gpfs.net -p
    ./spectrumscale node add c902f08x03.gpfs.net -p

    # Add HDFS Transparency DataNodes
    ./spectrumscale node add c902f08x04.gpfs.net
    ./spectrumscale node add c902f08x13.gpfs.net
    ./spectrumscale node add c902f08x14.gpfs.net
    ./spectrumscale node add c902f08x15.gpfs.net

    # Install
    ./spectrumscale install -precheck
    ./spectrumscale install

    # Verify install
    /usr/lpp/mmfs/bin/mmlscluster
    /usr/lpp/mmfs/bin/mmgetstate -a

    # Enable CES HDFS protocol
    ./spectrumscale enable hdfs

    # Configure the CES public IPs. At least two IPs must be specified.
    Note: CES IPs must be unused IPs, belong to a subnet made available by an existing adapter and routes on each HDFS Transparency NameNode, and forward/reverse DNS lookup must be in place for each CES IP.

    ./spectrumscale config protocols -e 172.16.2.80,172.16.2.84

    # Configure the CES Shared root filesystem
    ./spectrumscale config protocols -f gpfs -m /ibm/gpfs/cessharedroot

    # Create the new HDFS Transparency cluster with unique cluster name and data directory names. This example uses cluster name “cescluster1” and data directory “gpfscluster1”. Note: Cluster name does not support special characters. Spaces are not supported between commas.

    ./spectrumscale config hdfs new -n cescluster1 -nn c902f08x01.gpfs.net,c902f08x03.gpfs.net -dn c902f08x04.gpfs.net,c902f08x13.gpfs.net,c902f08x14.gpfs.net,c902f08x15.gpfs.net -f gpfs -d gpfscluster1

    # Check the HDFS configuration
    ./spectrumscale config hdfs list

    # Deploy HDFS
    ./spectrumscale deploy –precheck
    ./spectrumscale deploy

    # Check the -k ACL value for the filesystem is set to “all” when using HDFS protocol. If not, set the -k using the mmchfs -k all command.
    /usr/lpp/mmfs/bin/mmlsfs all

    # Verifying cluster

    • Check to see one can write to the GPFS mount point via POSIX
      touch /ibm/gpfs/gpfscluster1/testposix
      then edit testposix to put some values into the file
    • Check HDFS Transparency status for the NameNodes and Datanodes (must be executed on either a DataNode or a NameNode)
      /usr/lpp/mmfs/hadoop/sbin/mmhdfs hdfs status
    • Check HDFS Transparency Namenode status
      /usr/lpp/mmfs/bin/mmhealth node show HDFS_Namenode -v -N cesNodes
    • Check CES IP address and group names
      /usr/lpp/mmfs/bin/mmces address list –full-list
      Note: CES appends the “hdfs” prefix to the group name
    • Run basic hadoop commands
      /usr/lpp/mmfs/hadoop/bin/hdfs dfs -mkdir -p /user/root
      /usr/lpp/mmfs/hadoop/bin/hdfs dfs -ls /user
      /usr/lpp/mmfs/hadoop/bin/hdfs dfs -cp /testposix /user/root
      /usr/lpp/mmfs/hadoop/bin/hdfs dfs -cat /user/root/testposix

Note:

Join The Discussion

Your email address will not be published. Required fields are marked *