Overview

IBM BigInsights for Apache Hadoop for Bluemix service provides cloud based access to Hadoop open source technology with extra features such as BigSQL and Text Analytics. This article will show you how to integrate your Streams applications with the HBase services of the BigInsights for Apache Hadoop for Bluemix.  For simplicity, we’ll refer to the BigInsights for Apache Hadoop for Bluemix service as BigInsights on Bluemix for the remainder of this article.

If you want to access HBase in BigInsights on Bluemix from a Streaming Analytics application, you need to use the Streams HBase for Bluemix toolkit instead of the standard streamsx.hbase toolkit. The HBase for Bluemix toolkit uses the Apache Knox REST API to communicate to the BigInsights for Bluemix HBase services.  The toolkit provides SPL functions to create, read from, write to, and delete HBase tables.

This article will show you how to use this toolkit to interact with HBase services running on Bluemix.

Skill Level

You should have a basic understanding of Streams and HBase to follow along in this article.

Prerequisites

To develop a Streams application that integrates with HBase services on Bluemix, you’ll need the following:

  1. An instance of  the BigInsights for Bluemix service.  If you don’t already have one, you can add a new service from the IBM Bluemix catalog.
  2. The hostname of the HBase service you’ll be accessing.  To find the hostname for a BigInsights on Bluemix cluster, or to find your username and password, See these instructions.
  3. Download the HBase for Bluemix toolkit from the IBMStreams repository on GitHub.

Setup Instructions

Follow these instructions to set up the development environment:

  • Download and extract the version 0.1 release of the  HBase for Bluemix toolkit from Github.   This release contains the HBase for Bluemix toolkit and its dependencies:
    • com.ibm.streamsx.hbase.bluemix
    • com.ibm.streamsx.inet
    • com.ibm.streamsx.bytes
  • Add all three toolkits to the toolkit path for building the application.

Using the toolkit

If you have used the standard HBase toolkit before, then you are aware that it contains a set of operators for interacting with HBase.  The HBase for Bluemix toolkit is different, in that, the toolkit provides you with a set of SPL functions to access HBase via the REST APIs.  Therefore, to access HBase, you need to call these function in the logic clause of an operator.  The simplest way to use the functions is to use the Custom operator from the SPL Standard Toolkit.  For example:

 stream<RowType row> ReadHBase = Custom() {
     logic onProcess: {
         RowType row = hbaseGet("baseURL","userName", "password","tableName","row_to_get");
         submit ({row = row},ReadHBase);
      }
 }

In the example above, we use the Custom operator as a source operator.  The Custom operator calls the com.ibm.streamsx.hbasebluemix::hbaseGet function to fetch data from HBase.    Each record fetched is submitted as an output tuple to the output port.

This example demonstrates a common pattern for using the HBase for Bluemix toolkit functions.  Each of the functions in the toolkit requires three parameters:

  • baseURL – this is the URL to connect Knox Gateway.  It must be specified in the form of https://<hostname>:<port>/gateway/default/hbase.
  • userName – the user name to connect to the BigInsight for Bluemix service
  • password – the password to connect to the BigInsight for Bluemix service

To find out how you can retrieve this information from the BigInsight for Bluemix service, refer to this documentation.

The following table summarizes the functions provided by the HBase for Bluemix toolkit, as well as their HBase shell counterparts:

chart

 

 

 

 

Note that the functions do not provide all the features and options available in the shell. See the documentation for details.

Trying it out

The HBase for Bluemix toolkit contains three samples for you to try out the toolkit:

  • HBaseMakeTableAndPut – This sample demonstrates how you can create a table in HBase and put some data into the table.  This sample should be run first.
  • HBaseGet – This sample demonstrates how you can get data from a HBase table.
  • HBaseDeleteTable – This sample demonstrates how you can delete a table from HBase.

To run these samples, you need to:

  1. Build the samples:
    1. The samples can be built at the command line:
      1. Go to
        <HBase For Bluemix>/samples/HBaseBluemixSample/
      2. Type make
    2. Alternatively, import the sample project into Streams Studio using the Import Project wizard.
  2. Find the application bundle (.sab) file for the application.  For example, after compiling at the command line, the bundles for the 3 samples will be located in
    <HBase For Bluemix>/samples/HBaseBluemixSample/output/Distributed/
  3. Submit the application bundle to Bluemix via the Streams Console.
    • From the dashboard of the Streaming Analytics service in Bluemix, Click “Launch” to launch the Streams console.
    • Select “Submit Job”:
      submit
    • Browse to the location of the built application bundle as discussed earlier and click “Next”.
    • Next, enter the url of the HBase server, username and password as submission time values and click “OK”.
      submissionvalues

 

The application graph for the submitted job should appear in the Streams Graph.

You can verify that the application ran correctly because all the operators in the graph should turn green. Also, the output from the samples can be verified in the Console Log.

For example, after running the HBaseGet sample, you should see the retrieved rows in the Log Viewer:

console

 

 

 

 

 

 

 

 

 

Conclusion

This article has been an introductory overview of the HBase for Bluemix toolkit.

Useful Links:

Special thanks to Ivan Lopes for providing and validating this information!

Join The Discussion