IBM Support

Big SQL v3 and v4 supported Data Types - Hadoop Dev

Technical Blog Post


Abstract

Big SQL v3 and v4 supported Data Types - Hadoop Dev

Body

Big SQL uses the Hive Metastore as its underlying data representation and access method. The table below gives an overview of the data types that are supported in Big SQL as well as outlines their representation in the Hive Metastore and the Big SQL catalog. Data types available in Big SQL 3.x and new data types introduced in Big SQL on IBM® Open Platform with Apache Hadoop (IOP) are also presented in the table below.

Big SQL on IOP (Big SQL 4.x) Hive Metastore Big SQL Catalog Big SQL 3.x Hive Meta store Big SQL Catalog
TINYINT TINYINT SMALLINT TINYINT TINYINT SMALLINT
SMALLINT SMALLINT SMALLINT SMALLINT SMALLINT SMALLINT
INT INT INT INT INT INT
BIGINT BIGINT BIGINT BIGINT BIGINT BIGINT
REAL FLOAT REAL REAL FLOAT REAL
FLOAT DOUBLE DOUBLE FLOAT DOUBLE DOUBLE
DOUBLE DOUBLE DOUBLE DOUBLE DOUBLE DOUBLE
DECIMAL(p,s) DECIMAL(p,s) DECIMAL(p,s) N/A N/A N/A
STRING STRING VARCHAR of 32,672 characters STRING STRING VARCHAR of 32,672 characters
VARCHAR(x) VARCHAR(x) VARCHAR(x) VARCHAR(x) STRING VARCHAR(x)
CHAR CHAR CHAR CHAR (In Compatibility mode) STRING VARCHAR(x)
TIMESTAMP TIMESTAMP TIMESTAMP(9) TIMESTAMP TIMESTAMP TIMESTAMP(9)
TIMESTAMP(n) TIMESTAMP TIMESTAMP(n) TIMESTAMP(n) TIMESTAMP(n) TIMESTAMP(n)
DATE TIMESTAMP/DATE DATE DATE TIMESTAMP DATE
BOOLEAN BOOLEAN SMALLINT BOOLEAN BOOLEAN SMALLINT
ARRAY ARRAY ARRAY N/A N/A N/A
MAP/ ARRAY[INT] /ARRAY[VARCHAR(x)] MAP ASSOCIATIVE_ARRAY N/A N/A N/A
STRUCT /ROW STRUCT ROW N/A N/A N/A

Numeric DataTypes

As of Hive 0.13.0, supported numeric types in Hive include TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE and DECIMAL. With Big SQL on IOP and Big SQL 3.x, tables can be created with these numeric types with the exception of the DECIMAL type which is only available in Big SQL on IOP but there are also some other differences to point out.

Big SQL on IOP Big SQL 3.x Hive Metastore Big SQL Catalog
TINYINT TINYINT TINYINT SMALLINT
SMALLINT SMALLINT SMALLINT SMALLINT
INT INT INT INT
BIGINT BIGINT BIGINT BIGINT
REAL REAL FLOAT REAL
FLOAT FLOAT DOUBLE DOUBLE
DOUBLE DOUBLE DOUBLE DOUBLE
DECIMAL(p,s) N/A DECIMAL(p,s) DECIMAL(p,s)

TINYINT is supported only in the context of CREATE HADOOP TABLE statement in all other contexts it is unavailable with Big SQL.
Big SQL defines a REAL data type as a 32-bit floating point value. With Hive a FLOAT always refers to a 32-bit floating point value, which corresponds to the JAVA FLOAT data type. When a table is created using a REAL data-type in Big SQL, it is represented as a FLOAT data type in the Hive catalog.
In Hive, a DOUBLE always refers to a 64-bit double precision floating point value. With Big SQL the FLOAT or DOUBLE data-type is a 64-bit double precision floating point value. When a table is created with the FLOAT data-type in Big SQL, it is represented as a DOUBLE data-type in the Hive Metastore. With Big SQL the definition of a FLOAT data type is a synonym for DOUBLE.
With Big SQL on IOP DECIMAL types are supported, however, Hive supports a maximum precision of 38 digits whereas Big SQL on IOP supports a maximum precision of 31 digits. Partitioning by decimal type is supported in Big SQL but prior to Big SQL 4.2 it is not recommended due to performance concerns.

String DataTypes

Big SQL on IOP (Big SQL 4.x) Hive Metastore Big SQL Catalog Big SQL 3.x Hive Meta store Big SQL Catalog
STRING STRING VARCHAR of 32,672 characters STRING STRING VARCHAR of 32,672 characters
VARCHAR(x) VARCHAR(x) VARCHAR(x) VARCHAR(x) STRING VARCHAR(x)
CHAR CHAR CHAR CHAR (In Compatibility mode) STRING VARCHAR(x)

As of Hive 0.13.0 supported string data types in Hive include STRING, VARCHAR and CHAR. These are all supported in Big SQL on IOP but for performance reasons, it is recommended to use CREATE HADOOP TABLE with VARCHAR data type instead of the STRING type or SET HADOOP PROPERTY ‘bigsql.string.size’. With Big SQL v3, the VARCHAR(x) type is represented as a STRING in the Hive metastore. This was changed to VARCHAR(x) in Big SQL on IOP because when switching from Hive to Big SQL, the STRING type in the Hive would be mapped to the STRING data type in Big SQL and we wanted it to be mapped to a VARCHAR(x) to avoid the same performance concerns. Hence with Big SQL on IOP a VARCHAR type is represented as a VARCHAR(x) in the Hive catalog.
Hadoop tables can be created with CHAR data types in Big SQL on IOP and the Big SQL and Hive catalogs are both populated with CHAR types. If you enable the COMPATIBILITY_MODE global variable, CHAR data types can be created in Big SQL v3 but CHAR types are represented as a STRING type in the Hive metastore and a VARCHAR(x) in the Big SQL catalog.

Date/Time Types

Big SQL on IOP (Big SQL 4.x) Hive Metastore Big SQL Catalog Big SQL 3.x Hive Meta store SQL Catalog
TIMESTAMP TIMESTAMP TIMESTAMP TIMESTAMP TIMESTAMP TIMESTAMP
DATE

TIMESTAMP/DATE

DATE

DATE

TIMESTAMP

DATE

As of Hive 0.12.0 supported string data types in Hive include TIMESTAMP and DATE.
The DATE type was first introduced in Hive 0.12.0 and with Big SQL 3.x a Hadoop table can be created with a DATE type. However, in the Hive catalog, the DATE type is represented as a TIMESTAMP in Big SQL 3.x. With Big SQL on IOP (Big SQL 4.1), the DATE type in Big SQL is still represented as a TIMESTAMP in Hive, but in Big SQL 4.2, support was added to store the DATE as a DATE type in Hive by using the CREATE HADOOP TABLE …DATE STORED AS DATE clause.
You can not partition a Hadoop table using a TIMESTAMP type but you can partition a Hadoop table using a DATE type. When importing Hive DATE types (via HCAT_SYNC_OBJECTS) DATE types in Hive are represented as DATE types in Big SQL. More information on DATE types in Big SQL and Hive and be found in tips for populating tables with date types in Big SQL and Hive.

Misc Types

These types include BOOLEAN and BINARY in Hive. A hadoop table can be created with the BOOLEAN type in Big SQL but it is mapped to a SMALLINT in the Big SQL catalog. BINARY type is not supported in Big SQL in either Big SQL 3.x or Big SQL on IOP.

Complex Types

Big SQL on IOP (Big SQL 4.x) Hive Data Type Big SQL Data Type
ARRAY ARRAY ARRAY
MAP/ ARRAY[INT] /ARRAY[VARCHAR(x)] MAP ASSOCIATIVE_ARRAY
STRUCT /ROW STRUCT ROW

Big SQL on IOP introduces the ARRAY, MAP and STRUCT complex data types. Big SQL on IOP supports Hive as well as Big SQL syntax for complex data types.
Example: Create a table with an array column of varchars with no specified maximum size in Big SQL and in Hive

    jsqsh > create hadoop table complexTableArray     (product varchar(10), Big_SQL_version INT, complex_type varchar(15) ARRAY[])      row format delimited fields terminated by ‘,’ collection items terminated by ‘|’;    hive > create hadoop table complexTableArray     (product varchar(10), Big_SQL_version INT, complex_type ARRAY)     row format delimited fields terminated by ‘,’ collection items terminated by ‘|’;    

Note that one can also define a maximum array size in Big SQL by specifying a size of the array e.g.complex_type ARRAY[3] A MAP is an associative array such that you can query the array with INT or VARCHAR index keys whereas in the case of an ARRAY the key can only be an INT. With Big SQL on IOP you can create a table with a MAP column either by using the MAP or ARRAY keyword in the create table statement. This corresponds to the MAP data type in Hive catalog and the ASSOCIATIVE_ARRAY data type in the Big SQL catalog.
Example create a table with a varchar(8) key that is used to map to a varchar(15) value in Big SQL and equivalent statement in Hive.

    jsqsh > create hadoop table complexTableMap     (product varchar(10), Big_SQL_version INT, complex_type varchar(15) ARRAY[varchar(8)])      row format delimited fields terminated by ‘,’ collection items terminated by ‘|’ map keys terminated by ‘#’;    hive > create table complexTableMap      (product varchar(10), Big_SQL_version INT, complex_type MAP)     row format delimited fields terminated by ‘,’ collection items terminated by ‘|’ map keys terminated by ‘#’ ;    

A STRUCT also known as a structured object can be defined in the CREATE HADOOP TABLE statement of a Big SQL on IOP table. The STRUCT type is analogous to the Big SQL ROW type and table creation with both STRUCT and ROW types are supported with Big SQL on IOP. However in the Big SQL on IOP catalogs, they are represented as ROW types.
Example create a table with a structure that have 3 varchar elements in it in Big SQL and equivalent statement in Hive.

    jsqsh > create hadoop table complexTableStruct     (product varchar(10), Big_SQL_version INT,      complex_type ROW(Big_SQL_ARR varchar(8), Big_SQL_MAP varchar(15), Big_SQL_STRUCT varchar(15)))      row format delimited fields terminated by ‘,’ collection items terminated by ‘|’ ;    hive > create hadoop table complexTableStruct     (product varchar(10), Big_SQL_version INT, complex_type STRUCT)      row format delimited fields terminated by ‘,’ collection items terminated by ‘|’;    

Tables can not be partitioned on a complex type. There is no supported Hive Union data-type in Big SQL.

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm16259963