In this hands-on lab, you’ll learn the basics of using HBase natively and with Big SQL, IBM’s industry-standard SQL interface for data stored in its Hadoop-based platform. HBase is an open source key-value data storage mechanism commonly used in Hadoop environments. It features a column-oriented data model that enables programmers to efficiently retrieve data by key values from very large data sets. It also supports certain data modification operations, readily accommodates sparse data, and supports various kinds of data. Indeed, HBase doesnâ€™t have primitive data types â€“ all user data is stored as byte arrays. With BigInsights, programmers can use SQL to query data in Big SQL tables managed by HBase.
After completing this hands-on lab, youâ€™ll be able to:
- Use the HBase shell (command-line interface) to issue commands
- Create HBase tables that include multiple column families and columns
- Work directly with HBase tables to add, retrieve, and delete data
- Create, populate, and query Big SQL tables that store data in HBase
- Create and query views based on Big SQL tables managed by HBase
- Explore options for mapping relational database designs to Big SQL HBase tables
- Generate unique row key values for Big SQL tables managed by HBase
- Investigate the storage implications of certain Big SQL HBase table designs
- Explore the status of your HBase cluster through a Web interface
Allow 2.5 – 3 hours to complete this lab. Although this lab briefly summarizes HBase architecture and basic concepts, you may find it helpful to read introductory articles or watch some videos on HBase before beginning your work.
This lab was developed by Cynthia M. Saracco, IBM Silicon Valley Lab, with thanks to Bruce Brown and Piotr Pruski for their earlier contributions and Nailah Bissoon, Scott Gray, Dan Kikuchi, Ellen Patterson, Henry Quach, and Deepa Remesh for their reviews. Special thanks also to Kevin Hom for preparing this lab for publication on Hadoop Dev.Â Please post questions or comments about this lab or the technologies it describes to the forum on Hadoop Dev at https://developer.ibm.com/hadoop/.
This lab was developed for a BigInsights 4.0 environment in which Big SQL is installed and running. Big SQL is part of BigInsights Quick Start Edition, BigInsights Data Analyst, and BigInsights Data Scientist. Before proceeding with this tutorial, ensure that you have access to a working BigInsights platform with Big SQL running.
Examples in this lab use are based on a sample environment with the configuration shown in the tables below. If your environment is different, modify the sample code and instructions as needed to match your configuration.
|Big SQL Administrator||bigsql||bigsql|
|Ambari port number||8080|
|Big SQL database name||bigsql|
|Big SQL port number||51000|
|HBase installation directory||/usr/iop/18.104.22.168/hbase|
|Big SQL installation directory||/usr/ibmpacks/bigsql|
|JSqsh installation directory||/usr/ibmpacks/bigsql/4.0/jsqsh|
|Big SQL samples directory||/usr/ibmpacks/bigsql/4.0/bigsql/samples/data|
|About the screen captures, sample code, and environment configurationScreen captures in this lab depict examples and results that may vary from what you see when you complete the exercises. In addition, some code examples may need to be customized to match your environment.|
To get started with the lab exercises, you need to access to a working Big SQL environment. A free Quick Start Edition is available for download from Hadoop Dev at https://developer.ibm.com/hadoop/try-it/.
As of this writing, Big SQL is not available on the Quick Start Edition VMware image or on the IBM Analytics for Hadoop cloud service on Bluemix. Therefore, you will need to install and configure BigInsights on your own cluster, following instructions in the product’s Knowledge Center.
Before continuing with this lab, verify that Big SQL and all its pre-requisite services are running. Also verify that HBase is running.
|If have any questions or need help getting your environment up and running, visit Hadoop Dev (https://developer.ibm.com/hadoop/) and review the product documentation or post a message to the forum. You cannot proceed with subsequent lab exercises without access to a working environment.|