“Tachyon¬†is a memory-centric distributed storage system enabling reliable data sharing at memory-speed across cluster frameworks.”

 

This blog will show you how to:
  1. Install Tachyon v 0.8.0 on IOP 4.1
  2. Mount the Tachyon FS as a local linux mount
  3. Store SPARK RDDs in Tachyon for Persistence Improvements

 

Pre-Requisites:
– Ambari 2.1+ Cluster, with IOP 4.1 Installed
– Git

 

1. Add tachyon to currently deployed stack

IOP 4.1

# Clone the service definition onto the node running Ambari Server
git clone https://github.com/chuyqa/tachyon-ambari-service /var/lib/ambari-server/resource/stacks/BigInsights/4.1/services/TACHYON

ambari-server restart
 

 

Install Tachyon via Ambari
    • Add service via Ambari UI

Add service to cluster

      • Select tachyon 0.8.0

Select tachyon to add

    • Select a master server node and worker nodes. Both master and workers will have the assigned worker.memory blocked off and used by the Tachyon FS

  • Update required tachyon-config:

tachyon.master.address = Hostname of Tachyon master selected

tachyon.underfs.address = hdfs://namenodeHostName:8020

tachyon.worker.memory = Specifiy how much memory to allocate to each tachyon worker master/worker

Once Service is started, you may access the Tachyon Master web UI on http://tachyon.master.hostname:19999

2. Mount tachyon onto our local file system.


# Create a directory on the Tachyon namespace using tfs
/usr/iop/current/tachyon/bin/tachyon tfs mkdir /remoteDemoDir

# Create a local directory to use as the mount point
mkdir -p /data/localTachyonMount

# Mount tachyon
/usr/iop/current/TACHYON/bin/tachyon tfs mount /remoteDemoDir /data/localTachyonMount

 

At this stage this memory centric storage can be utilized for a variety of applications (Install mysql here and have MemSQL-like performance, Flume, etc).

 

3. Spark RDD Persistent Storage “OFF_HEAP” (2. Optional)

Let’s use Spark for a quick demo — in which Tachyon will be used to store our RDDs in a serialized format. Enabling “OFF_HEAP” has the advantages of:

  • Allowing executors to be smaller and share the same pool of memory
  • Reducing Garbage Collection times
  • In-memory cache is not lost if a given executor crashes
    [See RDD Persistence]

#Create a directory on the Tachyon namespace and upload test file
/usr/iop/current/tachyon/bin/tachyon tfs mkdir /sparkDemo
/usr/iop/current/tachyon/bin/tachyon tfs copyFromLocal /usr/iop/current/zookeeper-server/doc/LICENSE.txt /sparkDemo/

Now let’s open a spark shell to try some scala code and see a simple use case of OFF_HEAP *experimental as noted In spark documentation*


/usr/iop/current/spark-client/bin/spark-shell

scala> sc.hadoopConfiguration.set("fs.tachyon.impl", "tachyon.hadoop.TFS")
scala> var file = sc.textFile("tachyon://localhost:19998/LICENSE.txt")
scala> val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
scala> counts.persist(org.apache.spark.storage.StorageLevel.OFF_HEAP)
scala> counts.take(20)
scala> counts.take(20)

 

Sumamry
Here we saw how to quickly add a tachyon to a running IOP 4.1 cluster, along with 2 quick samples for mounting tachyon locally, and for using OFF_HEAP persistence in Spark.

Join The Discussion

Your email address will not be published. Required fields are marked *