Learn more >
Get the code
View the demo
by Bo Meng, Rich Hagarty | Updated March 28, 2019 - Published November 16, 2017
Archived date: 2019-06-04
Apache HBase is an open source NoSQL distributed database that runs on top of the Hadoop Distributed File System (HDFS). It is well-suited for faster read/write operations on large datasets with high throughput and low input/output latency. But, unlike relational and traditional databases, HBase lacks support for SQL scripting, and data types, and it requires the Java API to achieve the equivalent functionality.
Apache Spark is a big data processing engine built for speed, ease of use, and sophisticated analytics. Like Spark, HBase is built for fast processing of large amounts of data. Spark plus HBase is a popular solution for handling big data applications. To manage and access your data with SQL, HSpark connects to Spark and enables Spark SQL commands to be executed against an HBase data store.
This code pattern is intended to provide application developers who are familiar with SQL the ability to access HBase data tables using the same SQL commands. You quickly learn how to create and query the data tables by using Apache Spark SQL and the HSpark connector package. Then you can take advantage of the significant performance gains from using HBase without having to learn the Java APIs required to traditionally access the HBase data tables.
HSpark provides a new approach to supporting HBase. It leverages the unified big data processing engine of Spark, while also providing native SQL access to HBase data tables.
When you complete this pattern, you will understand how to:
Ready to put this code pattern to use? Complete details on how to get started running and using this application are in the README.md file.
September 2, 2019
The IBM Developer podcast is the place where developers hear all about open topics and technologies.
August 14, 2019
Back to top