The ever expanding and accelerating generation of Big data has made mandatory the usage of more efficient and robust storage filesystems. The data migration across multiple storage mediums is proving to be expensive. There is an inevitable and urgent need to perform inplace data analytics and to provide a single unified namespace to all types of storage mediums. IBM Spectrum Scale file system and IBM Elastic Storage Server (ESS) provide the perfect solution for these requirements. IBM Spectrum Scale filesystem is officially certified as a storage offering for Hortonworks HDP hadoop distribution.

IBM Spectrum Scale offers numerous advantages over HDFS, which is the default storage for hortonworks hdp clusters.

  • Reduces the data center footprint. No data copying and migration required for running Hadoop analytics.
  • Provides a unified access to data using different protocols such as File, Block and and Object.
  • Management of geographically dispersed data including disaster recovery
  • In-place data analytics. Spectrum Scale is POSIX compatible, which supports various applications and workloads. With Spectrum Scale HDFS Transparency Connector, you can analyze file and object data in-place with no data transfer or data movement.
  • Flexible deployment mode. You can not only run IBM Spectrum Scale on commercial storage rich server, but also choose IBM Elastic Storage Server (ESS) to provide higher performance massive storage system for your Hadoop workload. You can even deploy Spectrum Scale in traditionally SAN storage system as well for HDP.
  • Spectrum Scale enterprise-class data management features, such as POSIX-compliant APIs or the command line
  • Unified File and Object support (NFS,SMB,Object)
  • FIPS and NIST compliant data encryption
  • Cold data compression
  • Disaster Recovery
  • Snapshot support for point-in-time data captures
  • Policy-based information lifecycle management capabilities to manage PBs of data
  • Maturely enterprise-level data backup and archive solutions (inclusing Tape)
  • Remote cluster Mount Support
  • Mixed Filesystem Support
  • Seamless secure tiering to Cloud Object stores
  • IBM Spectrum Scale provides Unified access using different protocols and single Namespace to all kind of storage mediums.

    HDFS Protocol for Spectrum Scale
    Spectrum Scale provides a mechanism for accessing and ingesting data using HDFS RPC protocols such as HDFS file system utility and RPC daemons such as NameNodes and DataNodes. This is achieved using HDFS Transparency Connector. The HDFS Transparency connector redirects all the I/O traffic from Native HDFS to Spectrum Scale File System. This allows any Big Data Application to run seamlessly on top of Spectrum Scale file system without any changes to the application logic.

    Comparison of Spectrum Scale HDFS Transparency Connector Architecture with Native HDFS RPC.

    Spectrum Scale File System Configurations on Big Data Clusters

    Hortonworks HDP uses Apache Ambari for configuration, management and creating Big data clusters. Ambari server also provides a easy to use GUI Wizard for adding any new service on a existing cluster. The Ambari Integration module helps in linking Spectrum Scale Service to the Ambari Server, so that Add New Service Wizard of Ambari GUI can be used for creating the cluster.

    Spectrum Scale supports Three type of configurations in the cluster:-

    1. Shared Nothing Architecture (FPO(File Placement Optimizer))
    Spectrum Scale FPO configuration makes use of the local disks attached to each node which are part of the cluster.
    In FPO-enabled Spectrum Scale cluster deployments, a physical disk and Network Shared Disks(NSD) can have
    a one to one mapping. In this case, each node in the cluster is a NSD server providing access to the disks from the rest of the cluster. The NSD configuration file specifies the topology of each node of the cluster.

    Shared Nothing Configuration using Local Disks.

    2. Shared Storage Configuration (ESS(Elastic Storage Server))
    This configuration allows any local FPO enabled Spectrum Scale cluster to be added as a part of ESS(Elastic Storage Server) File system.

    Shared Storage Configuration
    3. Remote Mount Configuration
    This configuration allows the ESS Spectrum Scale File System to be mounted on the local cluster.
    Remote Cluster Mount Support

    4. Mixed Configuration Support
    This Configuration allows a local FPO filesystem configuration along with Mapping any other external cluster which can be either a ESS File system or a different Spectrum Scale cluster to be mounted on the cluster. The Big Data Applications can then use the remote mounted filesytem as well.

    Deploying Spectrum Scale File System in Hortonworks using Ambari Blueprints
    Ambari provides a way to deploy a cluster using all the configurations of a existing cluster. So, Using the exported configuration of the whole cluster can help in deploying the cluster with all the services and Spectrum Scale in a single go without any major intervention.

    Spectrum Scale Ambari Management pack
    The Spectrum Scale Ambari Management pack provides a seamless way to register the Spectrum Scale Filesystem onto the existing Hortonworks Hadoop cluster. This enables the capability for using Ambari server GUI Wizard for Installing the Spectrum Scale Service on to the existing HDP cluster.

    The Management pack comes in the form of tar package:-

    SpectrumScaleMPack-2.4.2.1-noarch.tar.gz

    This tar package comprises of Installer/Uninstaller and mpack upgrade scripts as well.

    $ tar -xvzf SpectrumScaleMPack-2.4.2.1-noarch.tar.gz
    ./SpectrumScaleExtension-MPack-2.4.2.1.tar.gz
    ./SpectrumScaleIntegrationPackageInstaller-2.4.2.1.bin
    ./SpectrumScaleMPackInstaller.py
    ./SpectrumScaleMPackUninstaller.py
    ./SpectrumScale_UpgradeIntegrationPackage
    ./sum.txt

    Hortonworks HDP Stack before adding Spectrum Scale MPack

    The Mpack Binary file adds the Spectrum Scale service onto the existing cluster along with creating extension links required for linking it to the current stack in the HDP cluster

    [root@c902f09x09 ~]# ./SpectrumScaleIntegrationPackageInstaller-2.4.2.1.bin

    Apache License Agreement ...........................
    ....................................................
    ....................................................
    Do you agree to the above license terms? [yes or no] yes
    Installing...
    INFO: ***Starting the Mpack Installer***

    Enter Ambari Server Port Number. If it is not entered, the installer will take default port 8080 :
    INFO: Taking default port 8080 as Ambari Server Port Number.
    Enter Ambari Server IP Address. Default=127.0.0.1 :
    INFO: Ambari Server IP Address not provided. Taking default Amabri Server IP Address as "127.0.0.1".
    Enter Ambari Server Username, default=admin :
    INFO: Taking default username "admin" as Ambari Server Username.
    Enter Ambari Server Password :
    INFO: Verifying Ambari Server Address, Username and Password.
    INFO: Verification Successful.
    INFO: Adding Spectrum Scale MPack : ambari-server install-mpack --mpack=SpectrumScaleExtension-MPack-2.4.2.1.tar.gz -v
    INFO: Spectrum Scale MPack Successfully Added. Continuing with Ambari Server Restart...
    INFO: Performing Ambari Server Restart.
    INFO: Ambari Server Restart Completed Successfully.
    INFO: Running command - curl -u admin:******* -H 'X-Requested-By: ambari' -X POST -d '{"ExtensionLink": {"stack_name":"HDP", "stack_version": "2.6", "extension_name": "SpectrumScaleExtension", "extension_version": "2.4.2.1"}}' http://127.0.0.1:8080/api/v1/links/
    INFO: Extension Link Created Successfully.
    INFO: Starting Spectrum Scale Changes.
    INFO: Spectrum Scale Changes Successfully Completed.
    INFO: Performing Ambari Server Restart.
    INFO: Ambari Server Restart Completed Successfully.
    INFO: Backing up original HDFS files to hdfs-original-files-backup
    INFO: Running command cp -f -r -p -u /var/lib/ambari-server/resources/common-services/HDFS/2.1.0.2.0/package/scripts/ hdfs-original-files-backup
    Done.
    [root@c902f09x09 ~]#

    This allows the Spectrum Scale Service too be listed in the Add Service Wizard of Ambari Server.

    Spectrum Scale Service Listed after Mpack Installation.

    The Add Service Wizard helps in configuring the Spectrum Scale Service.

    Assigment of Spectrum Scale Nodes on the cluster.
    Customize Service Panel
    Spectrum Scale Parameters Configuration
    Spectrum Scale Service Installation on the cluster nodes.
    Service Addition completed.
    Spectrum Scale Filesystem Added as a service in the existing HDP cluster.

    HDFS Transparency Daemon Status can also be verified:-

    # /usr/lpp/mmfs/hadoop/sbin/mmhadoopctl connector getstate
    Node1 : namenode running as process 8280.
    Node2 : datanode running as process 15192.
    Node3 : datanode running as process 31595.
    Node4 : datanode running as process 25271.
    Node5 : datanode running as process 10777.

    The Spectrum Scale Filesystem can also be verified :-

    # /usr/lpp/mmfs/bin/mmlscluster

    GPFS cluster information
    ========================
    GPFS cluster name: Djbigpfs.gpfs.net
    GPFS cluster id: 12888843454012907741
    GPFS UID domain: Djbigpfs.gpfs.net
    Remote shell command: /usr/bin/ssh
    Remote file copy command: /usr/bin/scp
    Repository type: CCR

    Node Daemon node name IP address Admin node name Designation
    --------------------------------------------------------------------------
    1 c902f09x09.gpfs.net 172.16.1.51 c902f09x09.gpfs.net quorum
    2 c902f09x12.gpfs.net 172.16.1.57 c902f09x12.gpfs.net quorum
    3 c902f09x10.gpfs.net 172.16.1.53 c902f09x10.gpfs.net
    4 c902f09x11.gpfs.net 172.16.1.55 c902f09x11.gpfs.net quorum

    Spectrum Scale Service Panel in Integrated state

    # /usr/lpp/mmfs/bin/mmlsfs all

    File system attributes for /dev/bigpfs:
    =======================================
    flag value description
    ------------------- ------------------------ -----------------------------------
    -f 8192 Minimum fragment size in bytes (system pool)
    65536 Minimum fragment size in bytes (other pools)
    -i 4096 Inode size in bytes
    -I 32768 Indirect block size in bytes
    -m 3 Default number of metadata replicas
    -M 3 Maximum number of metadata replicas
    -r 3 Default number of data replicas
    -R 3 Maximum number of data replicas
    -j scatter Block allocation type
    -D nfs4 File locking semantics in effect
    -k all ACL semantics in effect
    -n 32 Estimated number of nodes that will mount file system
    -B 262144 Block size (system pool)
    2097152 Block size (other pools)
    -Q none Quotas accounting enabled
    none Quotas enforced
    none Default quotas enabled
    --perfileset-quota No Per-fileset quota enforcement
    --filesetdf No Fileset df enabled?
    -V 17.00 (4.2.3.0) File system version
    --create-time Fri Nov 17 08:20:27 2017 File system creation time
    -z No Is DMAPI enabled?
    -L 16252928 Logfile size
    -E No Exact mtime mount option
    -S relatime Suppress atime mount option
    -K whenpossible Strict replica allocation option
    --fastea Yes Fast external attributes enabled?
    --encryption No Encryption enabled?
    --inode-limit 3948224 Maximum number of inodes
    --log-replicas 0 Number of log replicas
    --is4KAligned Yes is4KAligned?
    --rapid-repair Yes rapidRepair enabled?
    --write-cache-threshold 0 HAWC Threshold (max 65536)
    --subblocks-per-full-block 32 Number of subblocks per full block
    -P system;datapool Disk storage pools in file system
    -d gpfs1nsd;gpfs2nsd;gpfs3nsd;gpfs4nsd;gpfs5nsd;gpfs6nsd;gpfs7nsd;gpfs8nsd;gpfs9nsd;gpfs10nsd;gpfs11nsd;gpfs12nsd;gpfs13nsd;gpfs14nsd;gpfs15nsd;gpfs16nsd Disks in file system
    -A yes Automatic mount option
    -o none Additional mount options
    -T /bigpfs Default mount point
    --mount-priority 0 Mount priority

    HDFS Service Panel in Spectrum Scale Integrated State

    Remote Mount and Multifilesystem support
    Spectrum Scale cluster can have multiple local filesystems and also remote mounted filesystems. HDFS Transparency Connector provides support for multiple remote mounted filesystem as well.

    For example, if you have a remote filesystem mounted on your cluster:-

    # mmremotefs show all
    Local Name Remote Name Cluster name Mount Point Mount Options Automount Drive Priority
    djremotegpfs1 bigpfs bigpfs.gpfs.net /djremotegpfs1 rw no - 0

    When there are multiple filesystems present in the cluster, one of which is a local filesystem and the other is a remote mounted filesystem.

    They can be listed :-

    # mmlsfs all

    File system attributes for /dev/bigpfs:
    =======================================
    flag value description
    ------------------- ------------------------ -----------------------------------
    -f 8192 Minimum fragment size in bytes (system pool)
    65536 Minimum fragment size in bytes (other pools)
    -i 4096 Inode size in bytes
    -I 32768 Indirect block size in bytes
    -m 3 Default number of metadata replicas
    -M 3 Maximum number of metadata replicas
    -r 3 Default number of data replicas
    -R 3 Maximum number of data replicas
    -j scatter Block allocation type
    -D nfs4 File locking semantics in effect
    -k all ACL semantics in effect
    -n 32 Estimated number of nodes that will mount file system
    -B 262144 Block size (system pool)
    2097152 Block size (other pools)
    -Q none Quotas accounting enabled
    none Quotas enforced
    none Default quotas enabled
    --perfileset-quota No Per-fileset quota enforcement
    --filesetdf No Fileset df enabled?
    -V 17.00 (4.2.3.0) File system version
    --create-time Fri Nov 17 08:20:27 2017 File system creation time
    -z No Is DMAPI enabled?
    -L 16252928 Logfile size
    -E No Exact mtime mount option
    -S relatime Suppress atime mount option
    -K whenpossible Strict replica allocation option
    --fastea Yes Fast external attributes enabled?
    --encryption No Encryption enabled?
    --inode-limit 3948224 Maximum number of inodes
    --log-replicas 0 Number of log replicas
    --is4KAligned Yes is4KAligned?
    --rapid-repair Yes rapidRepair enabled?
    --write-cache-threshold 0 HAWC Threshold (max 65536)
    --subblocks-per-full-block 32 Number of subblocks per full block
    -P system;datapool Disk storage pools in file system
    -d gpfs1nsd;gpfs2nsd;gpfs3nsd;gpfs4nsd;gpfs5nsd;gpfs6nsd;gpfs7nsd;gpfs8nsd;gpfs9nsd;gpfs10nsd;gpfs11nsd;gpfs12nsd;gpfs13nsd;gpfs14nsd;gpfs15nsd;gpfs16nsd Disks in file system
    -A yes Automatic mount option
    -o none Additional mount options
    -T /bigpfs Default mount point
    --mount-priority 0 Mount priority

    File system attributes for bigpfs.gpfs.net:/dev/bigpfs:
    =======================================================
    flag value description
    ------------------- ------------------------ -----------------------------------
    -f 8192 Minimum fragment size in bytes (system pool)
    65536 Minimum fragment size in bytes (other pools)
    -i 4096 Inode size in bytes
    -I 32768 Indirect block size in bytes
    -m 3 Default number of metadata replicas
    -M 3 Maximum number of metadata replicas
    -r 3 Default number of data replicas
    -R 3 Maximum number of data replicas
    -j scatter Block allocation type
    -D nfs4 File locking semantics in effect
    -k all ACL semantics in effect
    -n 32 Estimated number of nodes that will mount file system
    -B 262144 Block size (system pool)
    2097152 Block size (other pools)
    -Q none Quotas accounting enabled
    none Quotas enforced
    none Default quotas enabled
    --perfileset-quota No Per-fileset quota enforcement
    --filesetdf No Fileset df enabled?
    -V 17.00 (4.2.3.0) File system version
    --create-time Mon Nov 27 05:53:34 2017 File system creation time
    -z No Is DMAPI enabled?
    -L 16252928 Logfile size
    -E No Exact mtime mount option
    -S relatime Suppress atime mount option
    -K whenpossible Strict replica allocation option
    --fastea Yes Fast external attributes enabled?
    --encryption No Encryption enabled?
    --inode-limit 3883776 Maximum number of inodes
    --log-replicas 0 Number of log replicas
    --is4KAligned Yes is4KAligned?
    --rapid-repair Yes rapidRepair enabled?
    --write-cache-threshold 0 HAWC Threshold (max 65536)
    --subblocks-per-full-block 32 Number of subblocks per full block
    -P system;datapool Disk storage pools in file system
    -d gpfs1nsd;gpfs2nsd;gpfs3nsd;gpfs4nsd;gpfs5nsd;gpfs6nsd;gpfs7nsd;gpfs8nsd;gpfs9nsd;gpfs10nsd;gpfs11nsd;gpfs12nsd;gpfs13nsd;gpfs14nsd;gpfs15nsd;gpfs16nsd Disks in file system
    -A no Automatic mount option
    -o none Additional mount options
    -T /djremotegpfs1 Default mount point
    --mount-priority 0 Mount priority

    The Spectrum Scale Service configuration can be changed to support this kind of multi-filesystem support.

    Spectrum Scale Spectrum Scale configuration changes for Multifilesystem support.

    HDFS Transparency daemons supports multifilesystem configurations like local and remote filesystems.

    # /usr/lpp/mmfs/hadoop/sbin/mmhadoopctl connector getstate
    Node1 : namenode running as process 8280.
    Node2 : datanode running as process 15192.
    Node3 : datanode running as process 31595.
    Node4 : datanode running as process 25271.
    Node5 : datanode running as process 10777.

    The HDFS Transparency connector lists the second mount point as a virtualized sub-directory in the first base filesystem mountpoint. So, that big data applications can use the whichever filesystem and based on there suitability and storage type.

    # hadoop fs -ls /
    Found 11 items
    drwxrwxrwx - yarn hadoop 0 2017-12-04 11:07 /app-logs
    drwxr-xr-x - hdfs root 0 2017-12-04 11:13 /apps
    drwxr-xr-x - yarn hadoop 0 2017-12-04 11:07 /ats
    drwxr-xr-x - hdfs hadoop 0 2017-11-27 07:10 /djremotegpfs1
    drwxr-xr-x - hdfs root 0 2017-12-04 11:07 /hdp
    drwxr-xr-x - mapred root 0 2017-12-04 11:07 /mapred
    drwxrwxrwx - mapred hadoop 0 2017-12-04 11:07 /mr-history
    drwxrwxrwx - spark hadoop 0 2017-12-04 11:18 /spark-history
    drwxrwxrwx - spark hadoop 0 2017-12-04 11:18 /spark2-history
    drwxrwxrwx - hdfs root 0 2017-12-04 11:10 /tmp
    drwxr-xr-x - hdfs root 0 2017-12-04 11:11 /user

    DFSIO and Teragen/Terasort:-

    DFSIO Read and Write standard benchmarks can be run on the Hortonworks HDP Hadoop clusters having spectrum scale in integrated state.

    DFSIO Write Throughput
    # yarn jar /usr/hdp/2.6.2.0-205/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 200MB 2>&1 | tee /tmp/TestDFSIO_write.deepak.txt
    17/12/04 12:09:01 INFO fs.TestDFSIO: TestDFSIO.1.8
    17/12/04 12:09:01 INFO fs.TestDFSIO: nrFiles = 10
    17/12/04 12:09:01 INFO fs.TestDFSIO: nrBytes (MB) = 200.0
    17/12/04 12:09:01 INFO fs.TestDFSIO: bufferSize = 1000000
    17/12/04 12:09:01 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
    17/12/04 12:09:01 INFO fs.TestDFSIO: creating control file: 209715200 bytes, 10 files
    17/12/04 12:09:03 INFO fs.TestDFSIO: created control files for: 10 files
    17/12/04 12:09:03 INFO client.RMProxy: Connecting to ResourceManager at c902f09x10.gpfs.net/172.16.1.53:8050
    17/12/04 12:09:03 INFO client.AHSProxy: Connecting to Application History server at c902f09x10.gpfs.net/172.16.1.53:10200
    17/12/04 12:09:03 INFO client.RMProxy: Connecting to ResourceManager at c902f09x10.gpfs.net/172.16.1.53:8050
    17/12/04 12:09:03 INFO client.AHSProxy: Connecting to Application History server at c902f09x10.gpfs.net/172.16.1.53:10200
    17/12/04 12:09:04 INFO mapred.FileInputFormat: Total input paths to process : 10
    17/12/04 12:09:04 INFO mapreduce.JobSubmitter: number of splits:10
    17/12/04 12:09:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1512403681949_0004
    17/12/04 12:09:04 INFO impl.YarnClientImpl: Submitted application application_1512403681949_0004
    17/12/04 12:09:04 INFO mapreduce.Job: The url to track the job: http://c902f09x10.gpfs.net:8088/proxy/application_1512403681949_0004/
    17/12/04 12:09:04 INFO mapreduce.Job: Running job: job_1512403681949_0004
    17/12/04 12:09:09 INFO mapreduce.Job: Job job_1512403681949_0004 running in uber mode : false
    17/12/04 12:09:09 INFO mapreduce.Job: map 0% reduce 0%
    17/12/04 12:09:20 INFO mapreduce.Job: map 20% reduce 0%
    17/12/04 12:09:21 INFO mapreduce.Job: map 67% reduce 0%
    17/12/04 12:09:23 INFO mapreduce.Job: map 70% reduce 0%
    17/12/04 12:09:24 INFO mapreduce.Job: map 77% reduce 0%
    17/12/04 12:09:27 INFO mapreduce.Job: map 83% reduce 0%
    17/12/04 12:09:29 INFO mapreduce.Job: map 87% reduce 0%
    17/12/04 12:09:31 INFO mapreduce.Job: map 90% reduce 0%
    17/12/04 12:09:33 INFO mapreduce.Job: map 90% reduce 23%
    17/12/04 12:09:36 INFO mapreduce.Job: map 93% reduce 23%
    17/12/04 12:09:38 INFO mapreduce.Job: map 97% reduce 23%
    17/12/04 12:09:39 INFO mapreduce.Job: map 97% reduce 30%
    17/12/04 12:09:41 INFO mapreduce.Job: map 100% reduce 30%
    17/12/04 12:09:42 INFO mapreduce.Job: map 100% reduce 100%
    17/12/04 12:09:43 INFO mapreduce.Job: Job job_1512403681949_0004 completed successfully
    17/12/04 12:09:43 INFO mapreduce.Job: Counters: 49
    File System Counters
    FILE: Number of bytes read=856
    FILE: Number of bytes written=1648330
    FILE: Number of read operations=0
    FILE: Number of large read operations=0
    FILE: Number of write operations=0
    HDFS: Number of bytes read=2450
    HDFS: Number of bytes written=2097152079
    HDFS: Number of read operations=43
    HDFS: Number of large read operations=0
    HDFS: Number of write operations=12
    Job Counters
    Launched map tasks=10
    Launched reduce tasks=1
    Data-local map tasks=10
    Total time spent by all maps in occupied slots (ms)=1957054
    Total time spent by all reduces in occupied slots (ms)=380820
    Total time spent by all map tasks (ms)=177914
    Total time spent by all reduce tasks (ms)=17310
    Total vcore-milliseconds taken by all map tasks=177914
    Total vcore-milliseconds taken by all reduce tasks=17310
    Total megabyte-milliseconds taken by all map tasks=2004023296
    Total megabyte-milliseconds taken by all reduce tasks=389959680
    Map-Reduce Framework
    Map input records=10
    Map output records=50
    Map output bytes=750
    Map output materialized bytes=910
    Input split bytes=1330
    Combine input records=0
    Combine output records=0
    Reduce input groups=5
    Reduce shuffle bytes=910
    Reduce input records=50
    Reduce output records=5
    Spilled Records=100
    Shuffled Maps =10
    Failed Shuffles=0
    Merged Map outputs=10
    GC time elapsed (ms)=3866
    CPU time spent (ms)=63920
    Physical memory (bytes) snapshot=25849901056
    Virtual memory (bytes) snapshot=139142180864
    Total committed heap usage (bytes)=28442099712
    Shuffle Errors
    BAD_ID=0
    CONNECTION=0
    IO_ERROR=0
    WRONG_LENGTH=0
    WRONG_MAP=0
    WRONG_REDUCE=0
    File Input Format Counters
    Bytes Read=1120
    File Output Format Counters
    Bytes Written=79
    17/12/04 12:09:43 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
    17/12/04 12:09:43 INFO fs.TestDFSIO: Date & time: Mon Dec 04 12:09:43 EST 2017
    17/12/04 12:09:43 INFO fs.TestDFSIO: Number of files: 10
    17/12/04 12:09:43 INFO fs.TestDFSIO: Total MBytes processed: 2000.0
    17/12/04 12:09:43 INFO fs.TestDFSIO: Throughput mb/sec: 13.672222146265433
    17/12/04 12:09:43 INFO fs.TestDFSIO: Average IO rate mb/sec: 16.02884292602539
    17/12/04 12:09:43 INFO fs.TestDFSIO: IO rate std deviation: 6.16775906029513
    17/12/04 12:09:43 INFO fs.TestDFSIO: Test exec time sec: 40.033
    17/12/04 12:09:43 INFO fs.TestDFSIO:

    DFSIO Read Throughput:-
    # yarn jar /usr/hdp/2.6.2.0-205/hadoop-mapreduce/hadoop-mapreduce-client-jobclient-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 200MB 2>&1 | tee /tmp/TestDFSIO_read.deepak.txt
    17/12/04 12:14:15 INFO fs.TestDFSIO: TestDFSIO.1.8
    17/12/04 12:14:15 INFO fs.TestDFSIO: nrFiles = 10
    17/12/04 12:14:15 INFO fs.TestDFSIO: nrBytes (MB) = 200.0
    17/12/04 12:14:15 INFO fs.TestDFSIO: bufferSize = 1000000
    17/12/04 12:14:15 INFO fs.TestDFSIO: baseDir = /benchmarks/TestDFSIO
    17/12/04 12:14:15 INFO fs.TestDFSIO: creating control file: 209715200 bytes, 10 files
    17/12/04 12:14:16 INFO fs.TestDFSIO: created control files for: 10 files
    17/12/04 12:14:16 INFO client.RMProxy: Connecting to ResourceManager at c902f09x10.gpfs.net/172.16.1.53:8050
    17/12/04 12:14:16 INFO client.AHSProxy: Connecting to Application History server at c902f09x10.gpfs.net/172.16.1.53:10200
    17/12/04 12:14:17 INFO client.RMProxy: Connecting to ResourceManager at c902f09x10.gpfs.net/172.16.1.53:8050
    17/12/04 12:14:17 INFO client.AHSProxy: Connecting to Application History server at c902f09x10.gpfs.net/172.16.1.53:10200
    17/12/04 12:14:17 INFO mapred.FileInputFormat: Total input paths to process : 10
    17/12/04 12:14:17 INFO mapreduce.JobSubmitter: number of splits:10
    17/12/04 12:14:17 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1512403681949_0005
    17/12/04 12:14:18 INFO impl.YarnClientImpl: Submitted application application_1512403681949_0005
    17/12/04 12:14:18 INFO mapreduce.Job: The url to track the job: http://c902f09x10.gpfs.net:8088/proxy/application_1512403681949_0005/
    17/12/04 12:14:18 INFO mapreduce.Job: Running job: job_1512403681949_0005
    17/12/04 12:14:23 INFO mapreduce.Job: Job job_1512403681949_0005 running in uber mode : false
    17/12/04 12:14:23 INFO mapreduce.Job: map 0% reduce 0%
    17/12/04 12:14:28 INFO mapreduce.Job: map 10% reduce 0%
    17/12/04 12:14:29 INFO mapreduce.Job: map 60% reduce 0%
    17/12/04 12:14:30 INFO mapreduce.Job: map 70% reduce 0%
    17/12/04 12:14:32 INFO mapreduce.Job: map 100% reduce 0%
    17/12/04 12:14:33 INFO mapreduce.Job: map 100% reduce 100%
    17/12/04 12:14:34 INFO mapreduce.Job: Job job_1512403681949_0005 completed successfully
    17/12/04 12:14:34 INFO mapreduce.Job: Counters: 50
    File System Counters
    FILE: Number of bytes read=862
    FILE: Number of bytes written=1648320
    FILE: Number of read operations=0
    FILE: Number of large read operations=0
    FILE: Number of write operations=0
    HDFS: Number of bytes read=2097154450
    HDFS: Number of bytes written=81
    HDFS: Number of read operations=53
    HDFS: Number of large read operations=0
    HDFS: Number of write operations=2
    Job Counters
    Launched map tasks=10
    Launched reduce tasks=1
    Data-local map tasks=8
    Rack-local map tasks=2
    Total time spent by all maps in occupied slots (ms)=535414
    Total time spent by all reduces in occupied slots (ms)=54450
    Total time spent by all map tasks (ms)=48674
    Total time spent by all reduce tasks (ms)=2475
    Total vcore-milliseconds taken by all map tasks=48674
    Total vcore-milliseconds taken by all reduce tasks=2475
    Total megabyte-milliseconds taken by all map tasks=548263936
    Total megabyte-milliseconds taken by all reduce tasks=55756800
    Map-Reduce Framework
    Map input records=10
    Map output records=50
    Map output bytes=756
    Map output materialized bytes=916
    Input split bytes=1330
    Combine input records=0
    Combine output records=0
    Reduce input groups=5
    Reduce shuffle bytes=916
    Reduce input records=50
    Reduce output records=5
    Spilled Records=100
    Shuffled Maps =10
    Failed Shuffles=0
    Merged Map outputs=10
    GC time elapsed (ms)=676
    CPU time spent (ms)=20380
    Physical memory (bytes) snapshot=24607191040
    Virtual memory (bytes) snapshot=139105824768
    Total committed heap usage (bytes)=24529338368
    Shuffle Errors
    BAD_ID=0
    CONNECTION=0
    IO_ERROR=0
    WRONG_LENGTH=0
    WRONG_MAP=0
    WRONG_REDUCE=0
    File Input Format Counters
    Bytes Read=1120
    File Output Format Counters
    Bytes Written=81
    17/12/04 12:14:34 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
    17/12/04 12:14:34 INFO fs.TestDFSIO: Date & time: Mon Dec 04 12:14:34 EST 2017
    17/12/04 12:14:34 INFO fs.TestDFSIO: Number of files: 10
    17/12/04 12:14:34 INFO fs.TestDFSIO: Total MBytes processed: 2000.0
    17/12/04 12:14:34 INFO fs.TestDFSIO: Throughput mb/sec: 120.43114349370747
    17/12/04 12:14:34 INFO fs.TestDFSIO: Average IO rate mb/sec: 268.7469482421875
    17/12/04 12:14:34 INFO fs.TestDFSIO: IO rate std deviation: 177.04005019489514
    17/12/04 12:14:34 INFO fs.TestDFSIO: Test exec time sec: 18.119
    17/12/04 12:14:34 INFO fs.TestDFSIO:

    Teragen benchmarking for generating sample data
    # hadoop jar hadoop-mapreduce-examples.jar teragen 100000000 /user/djdeepak5/terasort-input
    17/12/04 14:59:38 INFO client.RMProxy: Connecting to ResourceManager at c902f09x10.gpfs.net/172.16.1.53:8050
    17/12/04 14:59:38 INFO client.AHSProxy: Connecting to Application History server at c902f09x10.gpfs.net/172.16.1.53:10200
    17/12/04 14:59:54 INFO terasort.TeraSort: Generating 100000000 using 2
    17/12/04 14:59:56 INFO mapreduce.JobSubmitter: number of splits:2
    17/12/04 15:00:00 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1512403681949_0015
    17/12/04 15:00:04 INFO impl.YarnClientImpl: Submitted application application_1512403681949_0015
    17/12/04 15:00:04 INFO mapreduce.Job: The url to track the job: http://c902f09x10.gpfs.net:8088/proxy/application_1512403681949_0015/
    17/12/04 15:00:04 INFO mapreduce.Job: Running job: job_1512403681949_0015
    17/12/04 15:00:13 INFO mapreduce.Job: Job job_1512403681949_0015 running in uber mode : false
    17/12/04 15:00:13 INFO mapreduce.Job: map 0% reduce 0%
    17/12/04 15:00:24 INFO mapreduce.Job: map 1% reduce 0%
    17/12/04 15:00:30 INFO mapreduce.Job: map 3% reduce 0%
    17/12/04 15:00:36 INFO mapreduce.Job: map 4% reduce 0%
    17/12/04 15:00:39 INFO mapreduce.Job: map 5% reduce 0%
    17/12/04 15:00:42 INFO mapreduce.Job: map 8% reduce 0%
    17/12/04 15:00:48 INFO mapreduce.Job: map 9% reduce 0%
    17/12/04 15:00:51 INFO mapreduce.Job: map 11% reduce 0%
    17/12/04 15:00:54 INFO mapreduce.Job: map 12% reduce 0%
    17/12/04 15:01:03 INFO mapreduce.Job: map 15% reduce 0%
    17/12/04 15:01:10 INFO mapreduce.Job: map 16% reduce 0%
    17/12/04 15:01:12 INFO mapreduce.Job: map 18% reduce 0%
    17/12/04 15:01:16 INFO mapreduce.Job: map 19% reduce 0%
    17/12/04 15:01:18 INFO mapreduce.Job: map 20% reduce 0%
    17/12/04 15:01:19 INFO mapreduce.Job: map 22% reduce 0%
    17/12/04 15:01:24 INFO mapreduce.Job: map 23% reduce 0%
    17/12/04 15:01:28 INFO mapreduce.Job: map 24% reduce 0%
    17/12/04 15:01:30 INFO mapreduce.Job: map 26% reduce 0%
    17/12/04 15:01:31 INFO mapreduce.Job: map 27% reduce 0%
    17/12/04 15:01:40 INFO mapreduce.Job: map 28% reduce 0%
    17/12/04 15:01:42 INFO mapreduce.Job: map 30% reduce 0%
    17/12/04 15:01:45 INFO mapreduce.Job: map 31% reduce 0%
    17/12/04 15:01:46 INFO mapreduce.Job: map 32% reduce 0%
    17/12/04 15:01:57 INFO mapreduce.Job: map 34% reduce 0%
    17/12/04 15:01:58 INFO mapreduce.Job: map 35% reduce 0%
    17/12/04 15:02:01 INFO mapreduce.Job: map 36% reduce 0%
    17/12/04 15:02:03 INFO mapreduce.Job: map 37% reduce 0%
    17/12/04 15:02:04 INFO mapreduce.Job: map 38% reduce 0%
    17/12/04 15:02:09 INFO mapreduce.Job: map 39% reduce 0%
    17/12/04 15:02:10 INFO mapreduce.Job: map 40% reduce 0%
    17/12/04 15:02:16 INFO mapreduce.Job: map 41% reduce 0%
    17/12/04 15:02:18 INFO mapreduce.Job: map 43% reduce 0%
    17/12/04 15:02:21 INFO mapreduce.Job: map 45% reduce 0%
    17/12/04 15:02:24 INFO mapreduce.Job: map 46% reduce 0%
    17/12/04 15:02:28 INFO mapreduce.Job: map 47% reduce 0%
    17/12/04 15:02:31 INFO mapreduce.Job: map 48% reduce 0%
    17/12/04 15:02:37 INFO mapreduce.Job: map 50% reduce 0%
    17/12/04 15:02:40 INFO mapreduce.Job: map 51% reduce 0%
    17/12/04 15:02:46 INFO mapreduce.Job: map 54% reduce 0%
    17/12/04 15:02:52 INFO mapreduce.Job: map 55% reduce 0%
    17/12/04 15:02:55 INFO mapreduce.Job: map 58% reduce 0%
    17/12/04 15:03:01 INFO mapreduce.Job: map 59% reduce 0%
    17/12/04 15:03:07 INFO mapreduce.Job: map 61% reduce 0%
    17/12/04 15:03:13 INFO mapreduce.Job: map 62% reduce 0%
    17/12/04 15:03:16 INFO mapreduce.Job: map 63% reduce 0%
    17/12/04 15:03:19 INFO mapreduce.Job: map 65% reduce 0%
    17/12/04 15:03:22 INFO mapreduce.Job: map 66% reduce 0%
    17/12/04 15:03:28 INFO mapreduce.Job: map 67% reduce 0%
    17/12/04 15:03:31 INFO mapreduce.Job: map 69% reduce 0%
    17/12/04 15:03:34 INFO mapreduce.Job: map 70% reduce 0%
    17/12/04 15:03:37 INFO mapreduce.Job: map 71% reduce 0%
    17/12/04 15:03:40 INFO mapreduce.Job: map 73% reduce 0%
    17/12/04 15:03:46 INFO mapreduce.Job: map 75% reduce 0%
    17/12/04 15:03:49 INFO mapreduce.Job: map 77% reduce 0%
    17/12/04 15:03:55 INFO mapreduce.Job: map 79% reduce 0%
    17/12/04 15:03:58 INFO mapreduce.Job: map 81% reduce 0%
    17/12/04 15:04:04 INFO mapreduce.Job: map 82% reduce 0%
    17/12/04 15:04:07 INFO mapreduce.Job: map 83% reduce 0%
    17/12/04 15:04:10 INFO mapreduce.Job: map 85% reduce 0%
    17/12/04 15:04:13 INFO mapreduce.Job: map 87% reduce 0%
    17/12/04 15:04:19 INFO mapreduce.Job: map 89% reduce 0%
    17/12/04 15:04:22 INFO mapreduce.Job: map 90% reduce 0%
    17/12/04 15:04:28 INFO mapreduce.Job: map 93% reduce 0%
    17/12/04 15:04:34 INFO mapreduce.Job: map 94% reduce 0%
    17/12/04 15:04:38 INFO mapreduce.Job: map 95% reduce 0%
    17/12/04 15:04:41 INFO mapreduce.Job: map 96% reduce 0%
    17/12/04 15:04:44 INFO mapreduce.Job: map 97% reduce 0%
    17/12/04 15:04:53 INFO mapreduce.Job: map 98% reduce 0%
    17/12/04 15:04:59 INFO mapreduce.Job: map 100% reduce 0%
    17/12/04 15:05:18 INFO mapreduce.Job: Job job_1512403681949_0015 completed successfully
    17/12/04 15:05:18 INFO mapreduce.Job: Counters: 31
    File System Counters
    FILE: Number of bytes read=0
    FILE: Number of bytes written=298084
    FILE: Number of read operations=0
    FILE: Number of large read operations=0
    FILE: Number of write operations=0
    HDFS: Number of bytes read=170
    HDFS: Number of bytes written=10000000000
    HDFS: Number of read operations=8
    HDFS: Number of large read operations=0
    HDFS: Number of write operations=4
    Job Counters
    Launched map tasks=2
    Other local map tasks=2
    Total time spent by all maps in occupied slots (ms)=6141058
    Total time spent by all reduces in occupied slots (ms)=0
    Total time spent by all map tasks (ms)=558278
    Total vcore-milliseconds taken by all map tasks=558278
    Total megabyte-milliseconds taken by all map tasks=6288443392
    Map-Reduce Framework
    Map input records=100000000
    Map output records=100000000
    Input split bytes=170
    Spilled Records=0
    Failed Shuffles=0
    Merged Map outputs=0
    GC time elapsed (ms)=716
    CPU time spent (ms)=121900
    Physical memory (bytes) snapshot=586973184
    Virtual memory (bytes) snapshot=23502049280
    Total committed heap usage (bytes)=402128896
    org.apache.hadoop.examples.terasort.TeraGen$Counters
    CHECKSUM=214760662691937609
    File Input Format Counters
    Bytes Read=0
    File Output Format Counters
    Bytes Written=10000000000

    Terasort on the generated data

    # hadoop jar hadoop-mapreduce-examples.jar terasort /user/djdeepak5/terasort-input /user/djdeepak5/terasort-output
    17/12/04 15:08:41 INFO terasort.TeraSort: starting
    17/12/04 15:08:42 INFO input.FileInputFormat: Total input paths to process : 2
    Spent 182ms computing base-splits.
    Spent 3ms computing TeraScheduler splits.
    Computing input splits took 185ms
    Sampling 10 splits of 76
    Making 1 from 100000 sampled records
    Computing parititions took 9525ms
    Spent 9713ms computing partitions.
    17/12/04 15:08:51 INFO client.RMProxy: Connecting to ResourceManager at c902f09x10.gpfs.net/172.16.1.53:8050
    17/12/04 15:08:51 INFO client.AHSProxy: Connecting to Application History server at c902f09x10.gpfs.net/172.16.1.53:10200
    17/12/04 15:09:05 INFO mapreduce.JobSubmitter: number of splits:76
    17/12/04 15:09:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1512403681949_0016
    17/12/04 15:09:14 INFO impl.YarnClientImpl: Submitted application application_1512403681949_0016
    17/12/04 15:09:14 INFO mapreduce.Job: The url to track the job: http://c902f09x10.gpfs.net:8088/proxy/application_1512403681949_0016/
    17/12/04 15:09:14 INFO mapreduce.Job: Running job: job_1512403681949_0016
    17/12/04 15:09:21 INFO mapreduce.Job: Job job_1512403681949_0016 running in uber mode : false
    17/12/04 15:09:21 INFO mapreduce.Job: map 0% reduce 0%
    17/12/04 15:09:30 INFO mapreduce.Job: map 1% reduce 0%
    17/12/04 15:09:33 INFO mapreduce.Job: map 4% reduce 0%
    17/12/04 15:09:35 INFO mapreduce.Job: map 8% reduce 0%
    17/12/04 15:09:37 INFO mapreduce.Job: map 10% reduce 0%
    17/12/04 15:09:38 INFO mapreduce.Job: map 11% reduce 0%
    17/12/04 15:09:42 INFO mapreduce.Job: map 12% reduce 0%
    17/12/04 15:09:43 INFO mapreduce.Job: map 15% reduce 0%
    17/12/04 15:09:44 INFO mapreduce.Job: map 18% reduce 0%
    17/12/04 15:09:47 INFO mapreduce.Job: map 19% reduce 0%
    17/12/04 15:09:50 INFO mapreduce.Job: map 20% reduce 0%
    17/12/04 15:09:51 INFO mapreduce.Job: map 26% reduce 0%
    17/12/04 15:09:52 INFO mapreduce.Job: map 28% reduce 0%
    17/12/04 15:09:55 INFO mapreduce.Job: map 29% reduce 0%
    17/12/04 15:09:58 INFO mapreduce.Job: map 33% reduce 0%
    17/12/04 15:09:59 INFO mapreduce.Job: map 36% reduce 0%
    17/12/04 15:10:00 INFO mapreduce.Job: map 37% reduce 0%
    17/12/04 15:10:03 INFO mapreduce.Job: map 38% reduce 4%
    17/12/04 15:10:05 INFO mapreduce.Job: map 39% reduce 4%
    17/12/04 15:10:06 INFO mapreduce.Job: map 41% reduce 4%
    17/12/04 15:10:07 INFO mapreduce.Job: map 43% reduce 4%
    17/12/04 15:10:08 INFO mapreduce.Job: map 46% reduce 4%
    17/12/04 15:10:09 INFO mapreduce.Job: map 46% reduce 5%
    17/12/04 15:10:12 INFO mapreduce.Job: map 47% reduce 6%
    17/12/04 15:10:15 INFO mapreduce.Job: map 48% reduce 7%
    17/12/04 15:10:16 INFO mapreduce.Job: map 50% reduce 7%
    17/12/04 15:10:17 INFO mapreduce.Job: map 51% reduce 7%
    17/12/04 15:10:18 INFO mapreduce.Job: map 53% reduce 8%
    17/12/04 15:10:19 INFO mapreduce.Job: map 55% reduce 8%
    17/12/04 15:10:21 INFO mapreduce.Job: map 57% reduce 8%
    17/12/04 15:10:22 INFO mapreduce.Job: map 57% reduce 9%
    17/12/04 15:10:23 INFO mapreduce.Job: map 58% reduce 9%
    17/12/04 15:10:25 INFO mapreduce.Job: map 59% reduce 10%
    17/12/04 15:10:26 INFO mapreduce.Job: map 61% reduce 10%
    17/12/04 15:10:28 INFO mapreduce.Job: map 61% reduce 11%
    17/12/04 15:10:29 INFO mapreduce.Job: map 66% reduce 11%
    17/12/04 15:10:31 INFO mapreduce.Job: map 67% reduce 12%
    17/12/04 15:10:33 INFO mapreduce.Job: map 68% reduce 12%
    17/12/04 15:10:34 INFO mapreduce.Job: map 69% reduce 12%
    17/12/04 15:10:35 INFO mapreduce.Job: map 70% reduce 12%
    17/12/04 15:10:37 INFO mapreduce.Job: map 72% reduce 14%
    17/12/04 15:10:39 INFO mapreduce.Job: map 75% reduce 14%
    17/12/04 15:10:40 INFO mapreduce.Job: map 76% reduce 14%
    17/12/04 15:10:42 INFO mapreduce.Job: map 77% reduce 14%
    17/12/04 15:10:45 INFO mapreduce.Job: map 78% reduce 14%
    17/12/04 15:10:46 INFO mapreduce.Job: map 78% reduce 15%
    17/12/04 15:10:47 INFO mapreduce.Job: map 82% reduce 15%
    17/12/04 15:10:48 INFO mapreduce.Job: map 83% reduce 15%
    17/12/04 15:10:49 INFO mapreduce.Job: map 86% reduce 17%
    17/12/04 15:10:52 INFO mapreduce.Job: map 86% reduce 18%
    17/12/04 15:10:53 INFO mapreduce.Job: map 87% reduce 18%
    17/12/04 15:10:55 INFO mapreduce.Job: map 89% reduce 18%
    17/12/04 15:10:56 INFO mapreduce.Job: map 91% reduce 18%
    17/12/04 15:10:58 INFO mapreduce.Job: map 91% reduce 19%
    17/12/04 15:10:59 INFO mapreduce.Job: map 93% reduce 19%
    17/12/04 15:11:00 INFO mapreduce.Job: map 96% reduce 19%
    17/12/04 15:11:01 INFO mapreduce.Job: map 96% reduce 20%
    17/12/04 15:11:03 INFO mapreduce.Job: map 97% reduce 20%
    17/12/04 15:11:04 INFO mapreduce.Job: map 100% reduce 22%
    17/12/04 15:11:10 INFO mapreduce.Job: map 100% reduce 23%
    17/12/04 15:11:13 INFO mapreduce.Job: map 100% reduce 24%
    17/12/04 15:11:16 INFO mapreduce.Job: map 100% reduce 25%
    17/12/04 15:11:22 INFO mapreduce.Job: map 100% reduce 26%
    17/12/04 15:11:25 INFO mapreduce.Job: map 100% reduce 27%
    17/12/04 15:11:31 INFO mapreduce.Job: map 100% reduce 28%
    17/12/04 15:11:34 INFO mapreduce.Job: map 100% reduce 29%
    17/12/04 15:11:37 INFO mapreduce.Job: map 100% reduce 30%
    17/12/04 15:11:43 INFO mapreduce.Job: map 100% reduce 31%
    17/12/04 15:11:49 INFO mapreduce.Job: map 100% reduce 32%
    17/12/04 15:11:55 INFO mapreduce.Job: map 100% reduce 33%
    17/12/04 15:12:35 INFO mapreduce.Job: map 100% reduce 38%
    17/12/04 15:12:38 INFO mapreduce.Job: map 100% reduce 48%
    17/12/04 15:12:41 INFO mapreduce.Job: map 100% reduce 57%
    17/12/04 15:12:44 INFO mapreduce.Job: map 100% reduce 66%
    17/12/04 15:12:47 INFO mapreduce.Job: map 100% reduce 67%
    17/12/04 15:13:08 INFO mapreduce.Job: map 100% reduce 68%
    17/12/04 15:13:17 INFO mapreduce.Job: map 100% reduce 69%
    17/12/04 15:13:23 INFO mapreduce.Job: map 100% reduce 70%
    17/12/04 15:13:29 INFO mapreduce.Job: map 100% reduce 71%
    17/12/04 15:13:32 INFO mapreduce.Job: map 100% reduce 72%
    17/12/04 15:13:38 INFO mapreduce.Job: map 100% reduce 73%
    17/12/04 15:13:50 INFO mapreduce.Job: map 100% reduce 74%
    17/12/04 15:13:56 INFO mapreduce.Job: map 100% reduce 75%
    17/12/04 15:14:06 INFO mapreduce.Job: map 100% reduce 76%
    17/12/04 15:14:09 INFO mapreduce.Job: map 100% reduce 77%
    17/12/04 15:14:21 INFO mapreduce.Job: map 100% reduce 78%
    17/12/04 15:14:27 INFO mapreduce.Job: map 100% reduce 79%
    17/12/04 15:14:33 INFO mapreduce.Job: map 100% reduce 80%
    17/12/04 15:14:39 INFO mapreduce.Job: map 100% reduce 81%
    17/12/04 15:14:42 INFO mapreduce.Job: map 100% reduce 82%
    17/12/04 15:14:51 INFO mapreduce.Job: map 100% reduce 83%
    17/12/04 15:15:00 INFO mapreduce.Job: map 100% reduce 84%
    17/12/04 15:15:03 INFO mapreduce.Job: map 100% reduce 85%
    17/12/04 15:15:15 INFO mapreduce.Job: map 100% reduce 86%
    17/12/04 15:15:18 INFO mapreduce.Job: map 100% reduce 87%
    17/12/04 15:15:24 INFO mapreduce.Job: map 100% reduce 88%
    17/12/04 15:15:30 INFO mapreduce.Job: map 100% reduce 89%
    17/12/04 15:15:39 INFO mapreduce.Job: map 100% reduce 90%
    17/12/04 15:15:48 INFO mapreduce.Job: map 100% reduce 91%
    17/12/04 15:15:57 INFO mapreduce.Job: map 100% reduce 92%
    17/12/04 15:16:00 INFO mapreduce.Job: map 100% reduce 93%
    17/12/04 15:16:09 INFO mapreduce.Job: map 100% reduce 94%
    17/12/04 15:16:15 INFO mapreduce.Job: map 100% reduce 95%
    17/12/04 15:16:21 INFO mapreduce.Job: map 100% reduce 96%
    17/12/04 15:16:31 INFO mapreduce.Job: map 100% reduce 97%
    17/12/04 15:16:40 INFO mapreduce.Job: map 100% reduce 98%
    17/12/04 15:16:46 INFO mapreduce.Job: map 100% reduce 99%
    17/12/04 15:16:49 INFO mapreduce.Job: map 100% reduce 100%
    17/12/04 15:16:57 INFO mapreduce.Job: Job job_1512403681949_0016 completed successfully
    17/12/04 15:16:57 INFO mapreduce.Job: Counters: 50
    File System Counters
    FILE: Number of bytes read=10400000012
    FILE: Number of bytes written=20811586438
    FILE: Number of read operations=0
    FILE: Number of large read operations=0
    FILE: Number of write operations=0
    HDFS: Number of bytes read=10000010564
    HDFS: Number of bytes written=10000000000
    HDFS: Number of read operations=231
    HDFS: Number of large read operations=0
    HDFS: Number of write operations=2
    Job Counters
    Launched map tasks=76
    Launched reduce tasks=1
    Data-local map tasks=75
    Rack-local map tasks=1
    Total time spent by all maps in occupied slots (ms)=7861986
    Total time spent by all reduces in occupied slots (ms)=9260416
    Total time spent by all map tasks (ms)=714726
    Total time spent by all reduce tasks (ms)=420928
    Total vcore-milliseconds taken by all map tasks=714726
    Total vcore-milliseconds taken by all reduce tasks=420928
    Total megabyte-milliseconds taken by all map tasks=8050673664
    Total megabyte-milliseconds taken by all reduce tasks=9482665984
    Map-Reduce Framework
    Map input records=100000000
    Map output records=100000000
    Map output bytes=10200000000
    Map output materialized bytes=10400000456
    Input split bytes=10564
    Combine input records=0
    Combine output records=0
    Reduce input groups=100000000
    Reduce shuffle bytes=10400000456
    Reduce input records=100000000
    Reduce output records=100000000
    Spilled Records=200000000
    Shuffled Maps =76
    Failed Shuffles=0
    Merged Map outputs=76
    GC time elapsed (ms)=30652
    CPU time spent (ms)=882600
    Physical memory (bytes) snapshot=199282565120
    Virtual memory (bytes) snapshot=914234179584
    Total committed heap usage (bytes)=219659370496
    Shuffle Errors
    BAD_ID=0
    CONNECTION=0
    IO_ERROR=0
    WRONG_LENGTH=0
    WRONG_MAP=0
    WRONG_REDUCE=0
    File Input Format Counters
    Bytes Read=10000000000
    File Output Format Counters
    Bytes Written=10000000000
    17/12/04 15:16:57 INFO terasort.TeraSort: done

    Conclusion:-
    IBM Spectrum Scale provides a Enterprise level alternative to HDFS file system used in Big data cluster. HDFS Transparency supports Multifilesystem which helps in leveraging filesystems created on different storage mediums and locations. Thus reducing the need for
    frequent data migrations. Remote cluster mount support helps in performing inplace analytics on a remotely mounted filesystem data as well.
    Mutli-protocol support in IBM Spectrum Scale filesystem helps in ingesting and performing operations on data easier.
    With the Hortonworks HDP support for IBM Spectrum Scale filesystem will allow many existing users to perform data analytics on there existing filesystem data without having to migrate the data to HDFS filesystem.

    Related Posts:-

    Top Five Benefits of IBM Spectrum Scale with Hortonworks Data Platform

    IBM Spectrum Scale and Hortonworks HDP for Winning Big Data Plays

    Deploying IBM Spectrum Scale File System using Apache Ambari framework on Hadoop clusters

    Big Blue Dancing the Hadoop Dance with Hortonworks

    https://hortonworks.com/partner/ibm/

    hdp-ibm-spectrum-scale-brings-enterprise-class-storage-place-analytics/

    Remote Mount and Multifilesystem support in IBM Spectrum Scale.

    IBM Spectrum Scale Performance Tuning

    IBM Spectrum Scale System Workloads Tuning in shared nothing cluster

    IBM Spectrum Scale system Spark Workloads Tuning

    IBM Spectrum Scale system database workloads tuning

    IBM Spectrum Scale system performance tuning for hadoop workloads

    IBM Spectrum Scale system HDFS Transparency Federation support

    IBM Spectrum Scale system HDFS Transparency short-circuit write support.

    References:-

    IBM Spectrum Scale Hadoop Integration and Support for HortonWorks HDP

    HDFS Transparency Protocol

    IBM Knowledge Center ( Big data and analytics )

    IBM Elastic Storage Server

    Apache Ambari Project

    Adding IBM Spectrum Scale Service to HDP cluster using existing ESS cluster

    Hortonworks Data Platform with IBM Spectrum Scale

    Mounting a Remote Spectrum Scale Filesystem

    Join The Discussion

    Your email address will not be published. Required fields are marked *