Issuing basic HBase commands

After confirming that all necessary services are running, you’re ready to start using HBase directly. Exercises in this lab are intended for those with little or no prior experience using HBase. As such, after completing this lab, you’ll know how to

  • Launch the HBase shell
  • Create an HBase table
  • Inspect the characteristics of a table
  • Alter properties associated with a table
  • Populate a table with data
  • Retrieve data from a table
  • Use HBase Web interfaces to explore information about your environment

As you work through this lab, you’ll become familiar with some basic HBase concepts, such as row keys, column families, and columns. You’ll also observe its schema-less nature. However, a detailed explanation of HBase is beyond the scope of this lab. Visit the Apache HBase site (http://hbase.apache.org/) or the HBase wiki (http://wiki.apache.org/hadoop/Hbase) for reference materials on HBase.

To keep this lab simple, you will create one HBase table to track customer reviews of various products.  Each review will have a unique identifier, summary information (e.g., product name and rating), reviewer data (e.g., name and location), and detailed information (e.g., review comments). In a relational DBMS, such information might be stored in a single table with one column for each attribute to be tracked (e.g., REVIEW-ID, PRODUCT, RATING, REVIEWER-NAME, REVIEWER-LOCATION, COMMENTS).   Furthermore, a data type would be specified for each column at creation – perhaps INT for the REVIEW-ID, VARCHAR(30) for the PRODUCT, and so on.

In HBase, your table design will be different. The unique identifier for each review will serve as the row key. Attributes commonly queried together will be grouped into a column family. HBase requires at least one column family per table. Yours will have three:

  • summary, which summarizes essential review information (such as the product name and rating).
  • reviewer, which tracks data related to the user who reviewed the product.
  • details, which tracks comments, tips, and other detailed review information.

Each of these column families may contain one or more columns, depending on the data associated with a given review. For example, one review might contain the name, location, and email address of a reviewer, while another might only contain the reviewer’s name. HBase is schema-less, so you don’t define any columns when you create your table. Furthermore, since all data in HBase is stored in byte arrays, you don’t declare data types for your columns. You’ll put this concepts into practice shortly.

In production environments, programmers typically interact with HBase through applications they’ve built in Java or another supported interface, such as REST or Thrift. However, to keep things simple, you’ll work with HBase through its command-line interface.

HBase consists of an active HBase Master Server and one or more Region Server(s). Region Servers manage user data modeled as HBase tables. HBase automatically partitions tables into regions, storing a range of rows together based on their key values. Regions are stored in files in your distributed file system.

Allow 1 – 1.5 hours to complete all sections of this lab. You must have a working BigInsights and HBase environment, as described in the first module of this series of lab exercises.    Please post questions or comments about this lab or the technologies it describes to the forum on Hadoop Dev at https://developer.ibm.com/hadoop/.

2.1. Creating and altering a table

 

To begin, create a reviews table and alter some of its default properties before populating it with data.

__1. If necessary, open a terminal window.

__2.Launch the HBase shell. From the HBase home directory (such as /usr/iop/4.0.0.0/hbase/bin), issue this command:

 

hbase shell
  • Ignore any informational messages that may appear. Verify that the shell launched successfully and that your screen appears similar to this:

image5

__3.          Optionally, type “help” to review information about supported shell commands. A portion of this information is shown here:

image6

__4.          Create an HBase table named reviews with 3 column families: summary, reviewer, and details. Ignore any warnings that may appear involving multiple SLF4J bindings.

create 'reviews', 'summary', 'reviewer', 'details'

image7

About the CREATE command CREATE only requires the name of the table and one or more column families. In a moment, you’ll see how to add columns to the table dynamically when you insert some data into the table.

__5.          List the HBase tables present on your system.

image8

__6.          Inspect the default properties associated with your new table:

describe 'reviews'

Note that your table is shown as ENABLED, or ready for use. Also note that each column family has some various properties associated with it. For example, the summary column family’s IN_MEMORY property is set to false in the screen capture below. Since we expect most HBase queries to reference this column family, let’s change the property to TRUE. This instructs HBase to give priority to caching this data.

image9

__7.          To alter (or drop) a table, you must first disable it:

disable 'reviews'

__8.          Alter the table to set the IN_MEMORY property of the summary column family to true.

alter 'reviews', {NAME => 'summary', IN_MEMORY => 'true'}

image10

__9.          Set the number of versions for the summary and reviewer column families to 2. HBase can store multiple versions of data for each column family. If your application does not require multiple versions, the VERSIONS property for each column family should be set to 1.

alter 'reviews', {NAME => 'summary', VERSIONS => 2}, {NAME => 'reviewer', VERSIONS => 2}

image11

__10.       Verify that your property changes were captured correctly:

describe 'reviews'

image12

__11.       Enable (or activate) the table so that it’s ready for use.

enable 'reviews'

Now you can populate your table with data and query it.

2.2. Inserting and retrieving data

This exercise introduces you to the PUT, GET, SCAN, and COUNT commands. As you might imagine, PUT enables you to write data to HBase. GET and SCAN enable you to retrieve data, while COUNT returns the total number of rows in your table.

__1. Insert some data into your HBase table. The PUT command enables you to write data into a single cell of an HBase table. This cell may reside in an existing row or may belong to a new row.
Issue this command:

put 'reviews', '101', 'summary:product', 'hat'
What happened after executing this command   Executing this command caused HBase to add a row with a row key of 101 to the reviews table and to write the value of hat into the product column of the summary column family. Note that this command dynamically created the summary:product column and that no data type was specified for this column.What if you have more data for this row? You need to issue additional PUT commands – one for each cell (i.e., each column family:column) in the target row. You’ll do that shortly.But before you do, consider what HBase just did behind the scenes . . . . HBase wrote your data to a Write-Ahead Log (WAL) in your distributed file system to allow for recovery from a server failure.   In addition, it cached your data (in a MemStore) of a specific region managed by a specific Region Server. At some point, when the MemStore becomes full, your data will be flushed to disk and stored in files (HFiles) in your distributed file system. Each HFile contains data related to a specific column family.

__2.          Retrieve the row. To do so, provide the table name and row key value to the GET command:

get 'reviews', '101'

image13

__3.          Add more cells (columns and data values) to this row:

put 'reviews', '101', 'summary:rating', '5'
put 'reviews', '101', 'reviewer:name', 'Chris'
put 'reviews', '101', 'details:comment', 'Great value'

 

About your table . . . . Conceptually, your table looks something like this:image14It has one row with 3 column families. The summary column family for this row contains two columns, while the other two column families for this row each have one column.Physically, data in each column family is stored together in your distributed file system (in one or more HFiles).

__4.          Retrieve row key 101 again:

get 'reviews', '101'

image15

About this output     This output can be a little confusing at first, because it’s showing that 4 rows are returned.   This row count refers to the number of lines (rows) displayed on the screen. Since information about each cell is displayed on a separate line and there are 4 cells in row 101, the GET command reports 4 rows.

__5.          Count the number of rows in the entire table and verify that there is only 1 row:

count 'reviews'

image16

The COUNT command is appropriate for small tables only.   (For large tables, use the Java RowCounter class or another efficient alternative. Consult the HBase site for details.)

__6.          Add 2 more rows to your table using these commands:

put 'reviews', '112', 'summary:product', 'vest'
put 'reviews', '112', 'summary:rating', '5'
put 'reviews', '112', 'reviewer:name', 'Tina'
put 'reviews', '133', 'summary:product', 'vest'
put 'reviews', '133', 'summary:rating', '4'
put 'reviews', '133', 'reviewer:name', 'Helen'
put 'reviews', '133', 'reviewer:location', 'USA'
put 'reviews', '133', 'details:tip', 'Sizes run small. Order 1 size up.'

 

Note that review 112 lacks any detailed information (e.g., a comment), while review 133 contains a tip in its details. Note also that review 133 includes the reviewer’s location, which is not present in the other rows. Let’s explore how HBase captures this information.

__7.          Retrieve the entire contents of the table using this SCAN command:

scan 'reviews'

image17

Note that SCAN correctly reports that the table contains 3 rows. The display contains more than 3 lines, because each line includes information for a single cell in a row. Note also that each row in your table has a different schema and that missing information is simply omitted.

Furthermore, each displayed line includes not only the value of a particular cell in the table but also its associated row key (e.g., 101), column family name (e.g., details), column name (e.g., comment), and timestamp. As you learned earlier, HBase is a key-value store. Together, these four attributes (row key, column family name, column qualifier, and timestamp) form the key.

Consider the implications of storing this key information with each cell value. Having a large number of columns with values for all rows (in other words, dense data) means that a lot of key information is repeated. Also, large row key values and long column family / column names increase the table’s storage requirements.

__8.          Finally, restrict the scan results to retrieve only the contents of the summary column family and the reviewer:name column for row keys starting at ‘120’ and ending at ‘150’.

scan 'reviews', {COLUMNS => ['summary', 'reviewer:name'], STARTROW => '120', STOPROW => '150'}

image18

Given your sample data, only row ‘133’ qualifies. Note that the reviewer’s location (reviewer:location) and all the review details (details:tip) were omitted from the results due to the scan parameters you specified.

2.3. Updating data

HBase doesn’t have an UPDATE command or API. Instead, programmers simply write another set of column values for the same row key. In this exercise, you’ll see how to update data values using the PUT command (again). You’ll also explore how HBase maintains multiple versions of your data for the summary and reviewer column families. As you’ll recall, in an earlier exercise you set the VERSIONS properties of these families to 2.

__1.         Update Tina’s review (row key 112) to change the rating to ‘4’:

put 'reviews', '112', 'summary:rating', '4'

__2.        Scan the table to inspect the change.

scan ‘reviews’

image19

By default, HBase returns the most recent version of data for each cell.

__3.        To see multiple versions of your data, issue this command:

scan 'reviews', {VERSIONS => 2}

image20

__4.          You can also GET the original rating value from row 112 by explicitly specifying the timestamp value. This value will differ on your system, so you will need to substitute the value appropriate for your environment for the timestamp shown below. Consult the output from the previous step to obtain this value.

get 'reviews', '112', {COLUMN => 'summary:rating', TIMESTAMP => 1421878110712}

image21

2.4. Deleting data

In this exercise, you’ll learn how to delete data in your HBase table. You can delete a single cell value within a row or all cell values within a row.

__1.       Delete Tina’s name from her review (row 112).

delete 'reviews', '112', 'reviewer:name'

__2.       Scan the table to inspect the change.

scan 'reviews'

image22

__3.       Delete all cells associated with Tina’s review (i.e., all data for row 112) and scan the table to inspect the change.

deleteall 'reviews', '112'
scan 'reviews'

image23

About DELETE . . . .   DELETE doesn’t remove data from the table immediately. Instead, it marks the data for deletion, which prevents the data from being included in any subsequent data retrieval operations.   Because the underlying files that form an HBase table (HFiles) are immutable, storage for deleted data will not be recovered until an administrator initiates a major compaction operation. This operation consolidates data and reconciles deletions by removing both the deleted data and the delete indicator.

2.5. Dropping a table

In this exercise, you’ll learn how to drop an HBase table. Because we want to retain the reviews table for future exercises, you’ll create a new table, verify its existence, and then drop it.

__1.       Create a sample table with 1 column family.

create 'sample', 'cf1'

__2.       Describe the table or verify that it exists. Issue one of these two commands:

describe 'sample'

image24

exists 'sample'

image25

__3.       Disable the table you just created. (Before you can drop a table, you must disable or deactivate it.)

disable 'sample'

__4.       Drop the table.

drop 'sample'

__5.       Verify that the table no longer exists.

exists 'sample'

image26

2.6. Exploring the impact of your work

Are you curious about how your work has affected your HBase environment? This exercise helps you explore some of the meta data available to you about your table as well as your overall HBase environment.

__1. Launch a Web browser.

__2. Enter the URL and port of your HBase Master Server Web interface (by default, this is port 60010). For example, if your host name is rvm.svl.ibm.com and the HBase Master Service Web interface port is 60010, you would enter http://rvm.svl.ibm.com:60010.

image27

Locating the HBase Master information port
By default, the HBase Master Web interface port is 60010 on your host machine. This information is configured in the . . . /conf/hbase-site.xml file within your HBase installation directory (e.g., /usr/iop/4.0.0.0/hbase). Look for the hbase.master.info.port property in the hbase-site.xml file.

__3. Scroll to the Tables section and click the User Tables tab. Note that your reviews table is present and that its data resides in 1 region (because your table is very small). Also note that the description highlights a few important aspects of your table, including its column families and properties that you altered to non-default values.

image28

__4. Click the [Details] link at top to display further details about the tables in your HBase server.

image29

__5. If necessary, scroll down to locate data for your table. Note that all your table’s properties and their values are displayed here.

image30

Where have you seen similar output?
This output should look familiar to you. Indeed, in a previous exercise you issued the describe command from the HBase shell to inspect the status of your table (enabled/disabled) as well as its properties.

__6. Click the Back button on your browser to return to the previous page (the main page for the HBase Master Server Web interface).

__7. In the Tables section, click on the User Tables tab and locate your reviews table again. Click on the link for this table.

image31

__8. Note the Region Server link for your table and click on it. You’ll find this information in the Table Regions section of the displayed page.

image32

__9. After you have been redirected to the HBase Region Server Web interface, skim through the information displayed.

image33

Locating the HBase Region Server information port
By default, the HBase Region Server Web interface port is 60030. This information is configured in the $HBASE_HOME/conf/hbase-site.xml file. Look for the hbase.regionserver.info.port property.

__10. In the Server Metrics section at top, click on the Requests tab to display the total read and write request counts for your server since its launch. (The screen capture below was taken from a system with 264 read requests and 11 write requests. As with other screen captures, your data may vary.)

image34

__11. Scroll to the Regions section towards the bottom of the page.

image35

__12. Click on the Request metrics tab in the Regions section to determine the number of read and write requests for the reviews table. (This screen capture was taken after 18 read requests and 15 write requests were issued.)

image36

__13. If necessary, open a terminal window. From the Linux/Unix command line (not the HBase shell), list the contents of the HBase data directory in your DFS. The HBase data directory location is determined at installation. In the example below, this directory is /app/hbase/data/data/default. Note that a subdirectory for your reviews table is present.

hdfs dfs -ls /apps/hbase/data/data/default/

image37

__14. Explore the contents of the …/reviews directory and look for a subdirectory with a long, system-generated name.

hdfs dfs -ls /apps/hbase/data/data/default/reviews

image38

__15. Investigate the contents of this subdirectory and note that it contains 1 subdirectory for each column family of your table. (Substitute the system-generated subdirectory name in your environment for the sample name shown below.) Recall that HBase physically organizes data in your table by column family, which is reflected here.

hdfs dfs -ls /apps/hbase/data/data/default/reviews/3a2bcc79c404ea284baf7e423e02aa63

image39

__16. If you’d like to learn how to explore your DFS contents using a Web browser, continue with the remainder of this lab module. Otherwise, skip to the next lab.

__17. If necessary, determine the URL for your Name Node’s Web interface. In your Hadoop installation directory (by default /usr/iop/4.0.0.0/hadoop), browse the . . . /conf/hdfs-site.xml file. Note the setting for the dfs.namenode.http-address property. By default, this will be at port 50070 of your Name Node’s host.

image40

__18. Direct your browser to the Name Node’s HTTP address and verify that your display is similar to this:

image41

__19. In the menu at top, click the arrow key next to the Utilities tab to expose a drop-down menu. Select Browse the file system.

image42

__20. Navigate through the DFS directory tree, investigating the contents of your HBase database. For example, the screen capture below displays the contents of the HBase subdirectory for the reviews table.
image43

5 comments on"HBase Intro Lab 2: Issuing basic HBase commands"

  1. csgo weapon June 23, 2016

    I love reading through your web site. Thanks a lot!
    csgo weapon http://fatimacollege.alumnos.in/blog.php?user=sdgrtryre4566&blogentry_id=5793

  2. Your tips is very fascinating
    csgo http://www.sippintip.com/blogs/post/26383

  3. chaussure adidas rouge July 19, 2017

    4John Terry relaxes on holiday whilst reading Sir Alex Ferguson’s book perhaps the Chelsea ski.
    chaussure adidas rouge http://www.terres-et-formes.com/asp/Catalogue.asp?PasCher=chaussure-adidas-rouge

  4. adidas gazelle soldes August 31, 2017

    “#hcafc— Chris Whiting (.Whiting9) December 10, 2016sunsportonline10th December 20164:23 pmI can’t tell you how much I hate Mesut Ozil’s thumb-sucking celebration.”
    adidas gazelle soldes http://a.frcls.fr/H3lium-Adidas-ZXZ-Chaussures-De-Course-Rose-Noire-Adidas-Gazelle-Soldes-AS5383.html

  5. here

    The hair bundles https://www.youtube.com/watch?v=koiFnDsfNPU is rather good, not just the style also come to feel so awesome and style, specifically in particular times.

Join The Discussion

Your email address will not be published. Required fields are marked *