HBase-Spark Module is a new feature in BigInsights-4.2.5, it is a library to support Spark accessing HBase table as external data source or sink. This module has 4 main important features:
(1) Basic Spark RDD support for HBase, including get, put, delete to HBase in Spark DAG.
(2) Full access to HBase in Spark Streaming Application
(3) Ability to do Bulk Load into HBase with Spark.
(4) Ability to be a data source to Spark SQL/Dataframe.

In this article, I will introduce how to use hbase-spark module in the Java or Scala client program. To understand this article, users need to have knowledge of hbase, spark, java and scala. I will create a maven project from scratch. The whole project is attached here:HBase-Spark

1.Create a maven project
The most important thing for this step is modifying pom.xml in the maven project, we need to include all required jars as dependency and add some plugins to compile scala and java.

2.Develop Java and Scala source code
We will use public API in hbase-spark module to write examples which cover BulkPut, Streaming BulkPut and Spark SQL/Dataframe. For more information on public API in hbase-spark module, refer to open source code: https://github.com/apache/hbase/tree/master/hbase-spark.

3.Package source code into jars
Use maven commands to package all related classes into jars. And then put the jars into the cluster where we can run the client program.

4.Run examples
In the cluster, use spark-submit command to run java/scala examples in the jar. Before running examples, we need to do some configuration works:
(1) users need to include related HBase jars. You can either specify all of them using –jars(double ‘-‘) in the spark-submit command, or add all of them into conf/spark-defaults.conf as below.

spark.driver.extraClassPath /usr/iop/current/hbase-client/lib/hbase-common.jar:/usr/iop/current/hbase-client/lib/hbase-client.jar:/usr/iop/current/hbase-client/lib/hbase-server.jar:/usr/iop/current/hbase-client/lib/hbase-protocol.jar:/usr/iop/current/hbase-client/lib/guava-12.0.1.jar:/usr/iop/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/iop/current/hbase-client/lib/zookeeper.jar:/usr/iop/current/hbase-client/lib/protobuf-java-2.5.0.jar:/usr/iop/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/iop/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/iop/current/hbase-client/lib/metrics-core-2.2.0.jar:/usr/iop/current/hbase-client/lib/hbase-spark.jar

spark.executor.extraClassPath /usr/iop/current/hbase-client/lib/hbase-common.jar:/usr/iop/current/hbase-client/lib/hbase-client.jar:/usr/iop/current/hbase-client/lib/hbase-server.jar:/usr/iop/current/hbase-client/lib/hbase-protocol.jar:/usr/iop/current/hbase-client/lib/guava-12.0.1.jar:/usr/iop/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/iop/current/hbase-client/lib/zookeeper.jar:/usr/iop/current/hbase-client/lib/protobuf-java-2.5.0.jar:/usr/iop/current/hbase-client/lib/hbase-hadoop2-compat.jar:/usr/iop/current/hbase-client/lib/hbase-hadoop-compat.jar:/usr/iop/current/hbase-client/lib/metrics-core-2.2.0.jar:/usr/iop/current/hbase-client/lib/hbase-spark.jar

(2) your program need to read HBase conf file, you may either copy the hbase-site.xml from hbase/conf to spark/conf or have statement below to read HBase conf in your program.
conf.addResource(new Path(“/etc/hbase/conf/hbase-site.xml”));
See sample code for details.

(3) we use spark-submit command below to run the example:
/usr/iop/current/spark2-client/bin/spark-submit –master local[2] –class com.ibm.hbase.JavaHBaseBulkPutExample ~/hbase-spark-example.jar tableName columnFamilyName

1 comment on"How to use HBase-Spark Module"

  1. Great post! A well-written resource to anyone looking to boost their HBase. The explanation is clear and gives a better understanding lookup method and I find there is a long way in making the entire process much more efficient and effective. I refer to this online resource http://mindmajix.com/hbase-training when gets stopped.

Join The Discussion

Your email address will not be published. Required fields are marked *