IBM Support

Using Big SQL tables located on Object Storage - Hadoop Dev

Technical Blog Post


Abstract

Using Big SQL tables located on Object Storage - Hadoop Dev

Body

Object Storage offers scalable and flexible data storage for cloud and on-premise use. By using Object Storage the data is stored separately from applications that create and use the data. This allows a centralized location to keep data, which can be utilized by many applications, such as in a data lake implementation.

Introduced in Big SQL 5.0, Big SQL supports creating tables that store and read data from an Object Storage location. Data can be inserted or loaded into the table through Big SQL statements or the table can access existing data in the Object Storage.

Several kinds of Object Storage can be used which support the S3 API interface. The IBM Cloud Object Storage (S3) service is an example of Object Storage that can be used.

Big SQL tables can be created with a location URI that references an Object Storage bucket and path. The S3 bucket should exist before creating the Big SQL table. Here is an example of the Big SQL table CREATE statement using Object Storage URI as the location.

Example:

create hadoop table fruit_s3(name varchar(20))  partitioned by (color varchar(20))  location 's3a://demo/fruit_s3';  

The location URI syntax is composed for S3 as follows:

  s3a://<bucket>/<path>  s3a://demo/fruit_s3  

When using Object Storage with a Hadoop cluster the authentication credentials need to be made available to the Hadoop services. This can be done by adding properties to the hadoop core-site.xml file though the Ambari UI.

The S3 properties have a prefix of fs.s3a.* for each. Here is an example of properties used to connect to S3 authentication.

fs.s3a.access.key=123456789  fs.s3a.secret.key=abcdefgh  fs.s3a.endpoint=s3-api.sjc-us-geo.objectstorage.softlayer.net  fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem  

Sensitive properties can be stored in an encrypted Hadoop credential file instead of the core-site.xml. Steps to set up a Hadoop credential file are here: Enabling Big SQL to use a credential keystore file

For more information about the S3 properties see the Hadoop documentation here:

Big SQL tables created with an Object Storage location can be used in SQL statements the same as any other Big SQL table. Here are some examples of CREATE, INSERT, LOAD, and SELECT.

Example with CREATE and INSERT

create hadoop table fruit_s3(name varchar(20))   partitioned by (color varchar(20))   location 's3a://demo/fruit_s3';    insert into fruit_s3 values ('Apple','Red'),('Lemon','Yellow'),('Kiwi','Green'),  ('Grape','Purple'),('Orange','Orange'),('Blueberries','Blue'),('Cherry','Red'),  ('Banana','Yellow'),('Lime','Green');    select * from fruit_s3;  +-------------+--------+  | NAME        | COLOR  |  +-------------+--------+  | Orange      | Orange |  | Kiwi        | Green  |  | Banana      | Yellow |  | Cherry      | Red    |  | Lemon       | Yellow |  | Grape       | Purple |  | Blueberries | Blue   |  | Lime        | Green  |  | Apple       | Red    |  +-------------+--------+  

Example with CREATE and LOAD

CREATE HADOOP TABLE GENDATA_1YR (ID BIGINT, NOW VARCHAR(30))   PARTITIONED BY (YEAR INTEGER, MONTH INTEGER)   STORED AS PARQUET LOCATION 's3a://s3a-test/gendata_1yr';    LOAD HADOOP USING FILE URL '/tmp/data/gendata_files/gendata_2016.csv' INTO TABLE GENDATA_1YR APPEND;    SELECT * FROM GENDATA_1YR FETCH FIRST 2 ROWS ONLY;  +--------+------------------------------+------+-------+  |     ID | NOW                          | YEAR | MONTH |  +--------+------------------------------+------+-------+  | 334548 | Tue Jun 28 23:00:04 PDT 2016 | 2016 |     6 |  | 334549 | Tue Jun 28 23:01:04 PDT 2016 | 2016 |     6 |  +--------+------------------------------+------+-------+  

More information and examples are available in the Big SQL Knowledge Center here:

Using S3 Object Storage with Big SQL tables

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm16259835