Support My IBM Log in

IBM Support

Big SQL 4.2 Performance - Partition on Column Expression - Automatic Generation of Local Predicates - Hadoop Dev

Technical Blog Post


Abstract

Big SQL 4.2 Performance - Partition on Column Expression - Automatic Generation of Local Predicates - Hadoop Dev

Body

New in Big SQL 4.2 is the capability to allow a partition expression to be specified in the CREATE HADOOP TABLE statement in the PARTITIONED BY clause. For an overview of this feature, please look at the following post.

An overview of automatic generation of predicates for such tables is outlined in this post. Big SQL will automatically generate local predicates for partition expressions involving deterministic and monotonic functions. A non-deterministic function is one such that the result of the function maybe different each time e.g. RAND is a non deterministic scalar function. In this case Big SQL will not generate a new predicate. A monotonic function is one that as the input to the function increases/decreases, the result of the function increases/decreases. YEAR, INT are examples of a monotonic functions but MONTH and DAY are non-monotonic functions. For non-monotonic functions Big SQL will not be able to generate greater than or less than predicates.

Why would you care about about all this? There can be performance advantages if you are smart about how you partition your Hadoop tables. This post will show you how to determine what predicates are generated and show you how to smarter define the tables so that more predicates can be automatically generated. The text in green illustrates the correct syntax to use in the CREATE HADOOP TABLE statement.

    CREATE HADOOP TABLE INVENTORY_A (  trans_id int, product varchar(50), trans_ts timestamp  )   PARTITIONED BY (   INT(cast(trans_ts as date))/10000 AS year_part,  INT(cast(trans_ts as date))/100 AS month_part,  INT(cast(trans_ts as date)) AS day_part  )    CREATE HADOOP TABLE INVENTORY_B (  trans_id int, product varchar(50), trans_ts timestamp  )   PARTITIONED BY ( ;   YEAR(trans_ts) AS year_part,  MONTH(trans_ts) AS month_part,  DAY(trans_ts) AS day_part  )    

The data is already loaded into the tables and some sample data is shown below. Notice that the month_part column of the INVENTORY_A table does not only represent the month but represents the year as well. Also, the day_part represents the year as well as the month.

    select * from INVENTORY_A limit 6;  +----------+--------------+-------------------------+-----------+------------+----------+  | TRANS_ID | PRODUCT      | TRANS_TS                | YEAR_PART | MONTH_PART | DAY_PART |  +----------+--------------+-------------------------+-----------+------------+----------+  |     1005 | Beach Towel  | 2016-06-15 15:30:00.000 |      2016 |     201606 | 20160615 |  |     1000 | Hat          | 2016-06-13 15:30:00.000 |      2016 |     201606 | 20160613 |  |     1001 | Sun Screen   | 2016-06-13 16:30:00.000 |      2016 |     201606 | 20160613 |  |     1002 | Water Bottle | 2016-06-13 17:30:00.000 |      2016 |     201606 | 20160613 |  |     1003 | Sandals      | 2016-06-13 15:30:00.000 |      2016 |     201606 | 20160613 |  |     1004 | Lip Balm     | 2016-06-13 15:31:00.000 |      2016 |     201606 | 20160613 |  +----------+--------------+-------------------------+-----------+------------+----------+  select * from INVENTORY_B limit 6;  +----------+--------------+-------------------------+-----------+------------+----------+  | TRANS_ID | PRODUCT      | TRANS_TS                | YEAR_PART | MONTH_PART | DAY_PART |  +----------+--------------+-------------------------+-----------+------------+----------+  |     1000 | Hat          | 2016-06-13 15:30:00.000 |      2016 |          6 |       13 |  |     1001 | Sun Screen   | 2016-06-13 16:30:00.000 |      2016 |          6 |       13 |  |     1002 | Water Bottle | 2016-06-13 17:30:00.000 |      2016 |          6 |       13 |  |     1003 | Sandals      | 2016-06-13 15:30:00.000 |      2016 |          6 |       13 |  |     1004 | Lip Balm     | 2016-06-13 15:31:00.000 |      2016 |          6 |       13 |  |     1005 | Beach Towel  | 2016-06-15 15:30:00.000 |      2016 |          6 |       15 |  +----------+--------------+-------------------------+-----------+------------+----------+    

Now consider the following queries on these two tables:

    Query 1: select count(trans_id) from INVENTORY_A where trans_ts = '2016-06-14';  Query 2: select count(trans_id) from INVENTORY_B where trans_ts = '2016-06-14';  Query 3: select count(trans_id) from INVENTORY_A where trans_ts between '2016-06-13' and '2016-06-30';  Query 4: select count(trans_id) from INVENTORY_B where trans_ts between '2016-06-13' and '2016-06-30';    

For Queries 1, 2 and 3, as there is a local predicate on the trans_ts column, and the expressions in the PARTITIONED BY clause do not have any non-deterministic or non-monotonic functions, Big SQL will generate new local predicates on the year_part, month_part and day_part columns.
For Query 4, Big SQL will be unable to generate predicates on month_part and day_part columns because MONTH and DAY scalar functions are non-monotonic.

Now let’s take a look at the execution plans to see the additional predicates. The statements below give an example of how to generate the query execution plan for Query 1.

     db2 -tvf /home/bigsql/sqllib/misc/EXPLAIN.DDL  explain plan for   select count(trans_id) from INVENTORY_A where trans_ts = '2016-06-14';   db2exfmt -d  -n % -s % -g TIC -o query.exfmt -w -1 -# 0     

The plan snippet for injection of additional predicates for Query1 is below, the new predicates that are added are highlighted in green under the “Predicates” section of the TBSCAN operator. Note that the predicates on year_part, month_part and day_part are automatically generated.

    Predicates:                     ----------                  2) External Sarg Predicate,                          Comparison Operator:            Equal (=)                          Subquery Input Required:        No                          Filter Factor:                  0.04                            Predicate Text:                          --------------                          (Q1.YEAR_PART = 2016)                      3) External Sarg Predicate,                          Comparison Operator:            Equal (=)                          Subquery Input Required:        No                          Filter Factor:                  0.04                            Predicate Text:                          --------------                          (Q1.MONTH_PART = 201606)                      4) External Sarg Predicate,                          Comparison Operator:            Equal (=)                          Subquery Input Required:        No                          Filter Factor:                  0.04                            Predicate Text:                          --------------                          (Q1.DAY_PART = 20160614)                      5) External Sarg Predicate,                          Comparison Operator:            Equal (=)                          Subquery Input Required:        No                          Filter Factor:                  0.04                            Predicate Text:                          --------------                          (Q1.TRANS_TS = '2016-06-14-00.00.00.000000000000')    

The following except is from the plan generated for Query 2, notice the automatic additional of predicates on year_part, month_part and day_part once more.

    Predicates:                     ----------                  2) External Sarg Predicate,                          Comparison Operator:            Equal (=)                          Subquery Input Required:        No                          Filter Factor:                  0.04                            Predicate Text:                          --------------                          (Q1.YEAR_PART = 2016)                      3) External Sarg Predicate,                          Comparison Operator:            Equal (=)                          Subquery Input Required:        No                          Filter Factor:                  0.04                            Predicate Text:                          --------------                          (Q1.MONTH_PART = 6)                      4) External Sarg Predicate,                          Comparison Operator:            Equal (=)                          Subquery Input Required:        No                          Filter Factor:                  0.04                            Predicate Text:                          --------------                          (Q1.DAY_PART = 14)                      5) External Sarg Predicate,                          Comparison Operator:            Equal (=)                          Subquery Input Required:        No                          Filter Factor:                  0.04                            Predicate Text:                          --------------                          (Q1.TRANS_DATE = '2016-06-14-00.00.00.000000000000')      

The following except is from the plan generated for Query 3 table, notice the additional predicates on year_part, month_part and day_part.

    

Predicates: ---------- 2) External Sarg Predicate, Comparison Operator: Less Than or Equal (<=) Subquery Input Required: No Filter Factor: 0.333333 Predicate Text: -------------- (Q1.YEAR_PART <= 2016) 3) External Sarg Predicate, Comparison Operator: Less Than or Equal (<=) Subquery Input Required: No Filter Factor: 0.333333 Predicate Text: -------------- (2016 <= Q1.YEAR_PART) 4) External Sarg Predicate, Comparison Operator: Less Than or Equal (<=) Subquery Input Required: No Filter Factor: 0.333333 Predicate Text: -------------- (Q1.MONTH_PART <= 201606) 5) External Sarg Predicate, Comparison Operator: Less Than or Equal (<=) Subquery Input Required: No Filter Factor: 0.333333 Predicate Text: -------------- (201606 <= Q1.MONTH_PART) 6) External Sarg Predicate, Comparison Operator: Less Than or Equal (<=) Subquery Input Required: No Filter Factor: 0.333333 Predicate Text: -------------- (Q1.DAY_PART <= 20160630) 7) External Sarg Predicate, Comparison Operator: Less Than or Equal (<=) Subquery Input Required: No Filter Factor: 0.333333 Predicate Text: -------------- (20160614 <= Q1.DAY_PART) 8) External Sarg Predicate, Comparison Operator: Less Than or Equal (<=) Subquery Input Required: No Filter Factor: 0.333333 Predicate Text: -------------- (Q1.TRANS_TS <= '2016-06-30-00.00.00.000000000') 9) External Sarg Predicate, Comparison Operator: Less Than or Equal (<=) Subquery Input Required: No Filter Factor: 0.333333 Predicate Text: -------------- ('2016-06-14-00.00.00.000000000' <= Q1.TRANS_TS)

The following except is from the plan for Query 4, notice the additional predicates on year_part ONLY.

    

Predicates: ---------- 2) External Sarg Predicate, Comparison Operator: Less Than or Equal (<=) Subquery Input Required: No Filter Factor: 0.333333 Predicate Text: -------------- (Q1.YEAR_PART <= 2016) 3) External Sarg Predicate, Comparison Operator: Less Than or Equal (<=) Subquery Input Required: No Filter Factor: 0.333333 Predicate Text: -------------- (2016 <= Q1.YEAR_PART)

4) External Sarg Predicate, Comparison Operator: Less Than or Equal (<=) Subquery Input Required: No Filter Factor: 0.333333 Predicate Text: -------------- (Q1.TRANS_DATE <= '2016-06-15-00.00.00.000000000') 5) External Sarg Predicate, Comparison Operator: Less Than or Equal (<=) Subquery Input Required: No Filter Factor: 0.333333 Predicate Text: -------------- ('2016-06-13-00.00.00.000000000'

You can use the techniques in this post to determine what predicates are automatically generated. If the predicates are not being generated you can consider changing the table definition as demonstrated in this post to get additional predicate generation. There can be performance advantages if you are smart about how you define your tables.


[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm16260053

Overview Annual report Corporate social responsibility Financing Investor Newsroom Security, privacy & trust Senior leadership Careers with IBM Website Blog Publications Automotive Banking Consumer Good Energy Government Healthcare Insurance Life Sciences Manufacturing Retail Telecommunications Travel Our strategic partners Find a partner Become a partner - Partner Plus Partner Plus log in IBM TechXChange Community LinkedIn X Instagram YouTube Subscription Center Participate in user experience research Podcasts Contact IBM Privacy Terms of use Accessibility United States — English