Overview

The NFS Gateway supports NFSv3 and allows HDFS to be mounted as part of the client’s local file system. Currently NFS Gateway supports and enables the following usage patterns:

  • Users can browse the HDFS file system through their local file system on NFSv3 client compatible operating systems.
  • Users can download files from the the HDFS file system on to their local file system.
  • Users can upload files from their local file system directly to the HDFS file system.
  • Users can stream data directly to HDFS through the mount point. File append is supported but random write is not supported.

The NFS gateway machine requires the same dependencies as a HDFS client; Hadoop JAR files in PATH, HADOOP_CONF directory. The NFS gateway can be on the same host as DataNode, NameNode, or any HDFS client.

On a user command-line, the speed improvement is significantly noticeable:

Command Hadoop fs – Time (user) NFS Mount – Time (user)
mkdir -p 0m2.911s 0m0.001s
ls 0m3.071s 0m0.002s
cp 0m3.478s 0m0.001s
*image1 below shows various comparisons
Both are running against the same HDFS cluster. Accessing through NFS mounts allows users to leverage additional linux commands (grep, find, echo, cat, …) and run them against HDFS.

Installation

1. Install necessary packages via yum

yum install nfs-utils -y

2. If mounting hdfs as user root, add the following properties via ambari to HDFS > Configs

Custom core-site
hadoop.proxyuser.root.groups *
hadoop.proxyuser.root.hosts *

3. Stop the OS-provided nfs and rpcbind services

service nfs stop
service rpcbind stop

4. Start the hadoop-provided portmap and nfs3 services ( Add these lines to /etc/rc.local to mount hdfs on system startup )

/usr/iop/4.1.0.0/hadoop/sbin/hadoop-daemon.sh --script /usr/iop/4.1.0.0/hadoop/bin/hdfs start portmap

/usr/iop/4.1.0.0/hadoop/sbin/hadoop-daemon.sh --script /usr/iop/4.1.0.0/hadoop/bin/hdfs start nfs3

5. Create a local directory to use at the mount point, and mount hdfs – replace 0.0.0.0 with IP address shown in step 4

mkdir -p /data/hdfs
mount -t nfs -o vers=3,proto=tcp,nolock 0.0.0.0:/ /data/hdfs/

6. Enjoy native linux commands, as well as speed for accessing HDFS =)

nfsmountedHDFS_IOP41

1 comment on"Mounting HDFS onto local file system in IOP 4.1 (NFS Gateway)"

  1. Dmitry Zakharov June 14, 2016

    I uploaded 40GB of data (about 400 files ~100MB each).
    Initially I put files with hadoop command.
    hadoop fs -put *.txt … took 10 minutes.
    Then I mounted hdfs at /mnt/hdfs with NFS Gateway and copied files using cp command.
    cp *.txt /mnt/hdfs took 16 minutes.

    As one can see nfs gateway upload is slower.

    The small files upload and hadoop fs -ls are slower because every time one issues these command jvm should be created in memory this way these commands are slower then linux binary command like ‘cp’, ‘ls’ but this is true only for small amount of data.

Join The Discussion

Your email address will not be published. Required fields are marked *