Support My IBM Log in

IBM Support

Big SQL Performance - Join Range Filter Predicate and Partition Elimination - Hadoop Dev

Technical Blog Post


Abstract

Big SQL Performance - Join Range Filter Predicate and Partition Elimination - Hadoop Dev

Body

Join Range Filter Predicates

Hash join (HSJN) is the preferred join type used in most Big SQL Hadoop workloads as it is generally faster than other join operators. The purpose of this Join Range Filter Predicate (JRFP) feature is to inject additional predicates, where possible, into the probe side (outer) of the HSJN. This can improve the performance of queries that use HSJNs. For more information on hash joins review this paper.
In Big SQL 4.2, the JRFP feature is enabled by default, but the Big SQL query optimizer makes the decision of whether to inject these additional predicates or not. If there will be added benefit of including these predicates, then Big SQL will inject these predicates. JRFPs are injected for equality join predicates when HSJN is chosen in the access plan. The data type for each column used in the join needs to be the same type, however, the length can be different. The JRFPs are generated for base tables and inner join tables. To disable this feature, you can use the following registry variable: db2set DB2_EXTENDED_OPTIMIZATION=”JRFPREDS OFF”.

Deferred Partition Elimination

The Big SQL Scheduler component determines which partitions need to be scanned during query execution. When a large number of partitions is eliminated, performance can greatly improve. Another pertinent improvement in the Big SQL 4.2 release, referred to as ‘deferred partition pruning/elimination’, kicks in when the Big SQL optimizer chooses to use JRFPs.
Prior to Big SQL 4.2, the Scheduler could only interpret static predicate values, such as ‘select col1 from tab1 where tab1.col1=1”. Now in 4.2, the Scheduler will ‘defer’ partition elimination until after the Big SQL runtime component has passed computed values to it. If the query contains predicates with constant expressions that can be evaluated at runtime, then the Scheduler could now consider these for partition elimination. One example is the ABS() scalar function. If there is a predicate such as in this example, ‘select col1 from tab1 where tab1.col1=ABS(1)’, then the computed value of this ABS expression is passed to the Scheduler and then the Scheduler can eliminate partitions that do not qualify.
This deferred partition elimination feature could be used together with the JRFP feature. The JRFP values are computed at runtime and injected into the query as predicates by the Big SQL Compiler. When both features are combined the net effect is a further reduction in the number of rows being scanned and passed into the HSJN.

Demo of Performance Benefits of JRFPs and Deferred Partition Elimination

To better illustrate this feature consider the following example using the store_sales and date_dim tables of the TPC-DS benchmark:

    CREATE HADOOP TABLE STORE_SALES(      SS_SOLD_TIME_SK INT,      SS_ITEM_SK INT NOT NULL,      SS_CUSTOMER_SK INT,      SS_CDEMO_SK INT,      SS_HDEMO_SK INT,      SS_ADDR_SK INT,      SS_STORE_SK INT,      SS_PROMO_SK INT,      SS_TICKET_NUMBER BIGINT NOT NULL,      SS_QUANTITY BIGINT,      SS_WHOLESALE_COST DOUBLE,      SS_LIST_PRICE DOUBLE,      SS_SALES_PRICE DOUBLE,      SS_EXT_DISCOUNT_AMT DOUBLE,      SS_EXT_SALES_PRICE DOUBLE,      SS_EXT_WHOLESALE_COST DOUBLE,      SS_EXT_LIST_PRICE DOUBLE,      SS_EXT_TAX DOUBLE,      SS_COUPON_AMT DOUBLE,      SS_NET_PAID DOUBLE,      SS_NET_PAID_INC_TAX DOUBLE,      SS_NET_PROFIT DOUBLE  )  PARTITIONED BY(      SS_SOLD_DATE_SK INT  )    CREATE HADOOP TABLE DATE_DIM(      D_DATE_SK INT NOT NULL,      D_DATE_ID VARCHAR(16) NOT NULL,      D_DATE DATE STORED AS TIMESTAMP,      D_MONTH_SEQ BIGINT,      D_WEEK_SEQ BIGINT,      D_QUARTER_SEQ BIGINT,      D_YEAR BIGINT,      D_DOW BIGINT,      D_MOY BIGINT,      D_DOM BIGINT,      D_QOY BIGINT,      D_FY_YEAR BIGINT,      D_FY_QUARTER_SEQ BIGINT,      D_FY_WEEK_SEQ BIGINT,      D_DAY_NAME VARCHAR(9),      D_QUARTER_NAME VARCHAR(6),      D_HOLIDAY VARCHAR(1),      D_WEEKEND VARCHAR(1),      D_FOLLOWING_HOLIDAY VARCHAR(1),      D_FIRST_DOM BIGINT,      D_LAST_DOM BIGINT,      D_SAME_DAY_LY BIGINT,      D_SAME_DAY_LQ BIGINT,      D_CURRENT_DAY VARCHAR(1),      D_CURRENT_WEEK VARCHAR(1),      D_CURRENT_MONTH VARCHAR(1),      D_CURRENT_QUARTER VARCHAR(1),      D_CURRENT_YEAR VARCHAR(1)  )  

Now consider the following query:

    select ss_sold_time_sk, d_date from store_sales, date_dim where ss_sold_date_sk=d_date_sk and d_year=2001;    

Big SQL will calculate the minimum and maximum values from the build side of the HSJN. In this case the build side of the HSJN is shown below in orange – which is a select of few columns of the date_dim table with a local predicate on d_year=2001 (operator 6 in the plan). The probe side of the HSJN is highlighted below in blue . To determine if the Optimizer chose to use the JRFPs, take a look at the TBSCAN operator on the probe side of the HSJN (operator 5 in the plan) and look for ‘>=’ and ‘<=' predicates on an $INTERNAL_FUNC keyword as shown below in green.

In the following access plan example, the path of execution is read from the bottom up, and from left to right:

     Access Plan:  -----------          Total Cost:             1.73018e+06          Query Degree:           8                                                      Rows                                                 RETURN                                                 (   1)                                                  Cost                                                   I/O                                                   |                                               5.89315e+08                                                 DTQ                                                 (   2)                                               1.73018e+06                                                  12210                                                   |                                               1.47329e+08                                                 LTQ                                                 (   3)                                               1.37125e+06                                                  12210                                                   |                                               1.47329e+08                                                 ^HSJOIN                                                 (   4)                                               1.35438e+06                                                  12210                            /----------------------+-----------------------\                         6.91915e+08                                            410                         TBSCAN                                                 TBSCAN                         (   5)                                                 (   6)                       1.34023e+06                                              191.379                          12208                                                    2                           |                                                      |                       2.76766e+09                                               73049                       STORE_SALES                                             DATE_DIM                           Q2                                                     Q1    5) TBSCAN: (Table Scan)                  Cumulative Total Cost:          1.34023e+06                  Cumulative CPU Cost:            2.26061e+12                  Cumulative I/O Cost:            12208                  Cumulative Re-Total Cost:       1.34023e+06                  Cumulative Re-CPU Cost:         2.26061e+12                  Cumulative Re-I/O Cost:         12208                  Cumulative First Row Cost:      89.6155                    …  Predicates:  ...                  ----------                   6) External Sarg Predicate,                          Comparison Operator:            Less Than or Equal (<=)                          Subquery Input Required:        No                          Filter Factor:                  1                            Predicate Text:                          --------------                          (Q2.SS_SOLD_DATE_SK =)                          Subquery Input Required:        No                          Filter Factor:                  1                             Predicate Text:                          --------------                          (Q2.SS_SOLD_DATE_SK >= $INTERNAL_FUNC$())                    7) External Sarg Predicate,                          Comparison Operator:            Greater Than or Equal (>=)                          Subquery Input Required:        No                          Filter Factor:                  1                              Predicate Text:                          --------------                          (Q2.SS_SOLD_DATE_SK >= $INTERNAL_FUNC$())  6) TBSCAN: (Table Scan)                  Cumulative Total Cost:          191.316                  Cumulative CPU Cost:            1.22872e+08                  Cumulative I/O Cost:            2                  Cumulative Re-Total Cost:       191.316                  Cumulative Re-CPU Cost:         1.22872e+08                  Cumulative Re-I/O Cost:         2                  Cumulative First Row Cost:      89.8621  …    Predicates:                  ----------                  2) External Sarg Predicate,                          Comparison Operator:            Equal (=)                          Subquery Input Required:        No                          Filter Factor:                  0.00561267                            Predicate Text:                          --------------                          (Q1.D_YEAR = 2001)      

Now to demonstrate the effects of partition elimination combined with JRFP. The store_sales table is partitioned on the ss_sold_date_sk column. There are 1826 directories in the HDFS path for this table.

    hadoop fs -ls 'hdfs://bigaperf108.svl.ibm.com:8020/apps/hive/warehouse/hadoopds1000g.db /store_sales' | wc -l  1826    Note the output of this command:  select min(d_date_sk) as dim_min, max(d_date_sk) as dim_max from store_sales, date_dim where ss_sold_date_sk=d_date_sk and d_year=2001    DIM_MIN     DIM_MAX  ----------- -----------  2451911     2452275    

The range of values on the build side of the HSJN (d_date_sk) is 365 distinct values (i.e all values between and including 2451911 and 2452275). So what Big SQL is doing under the covers is adding these extra predicates in automatically for you during runtime processing.
For demonstration purposes, the query is transformed from a scan on all 1826 partitions of store_sales:
> select ss_sold_time_sk, d_date from store_sales, date_dim where ss_sold_date_sk=d_date_sk and d_year=2001;
To a scan on only 365 partitions:
> select ss_sold_time_sk, d_date from store_sales, date_dim where ss_sold_date_sk=d_date_sk and d_year=2001 and ss_sold_date_sk >=2450816 and ss_sold_date_sk<= 2452642
So that is savings of 80% saving in the scan. WOW! How Cool!

Expected Performance Improvement

For the following reasons, performance can be improved with the combination of these JRFPs and deferred partition elimination features:
1. Less data need to be read from HDFS from the probe side of the HSJN and less data goes into the HSJN
2. When rows from both sides of the HSJN are compared, they need to be uncompressed and then compared, this requires CPU cycles. With JRFPs, fewer rows need to be decompressed from the probe side when the comparison phase occurs
3. There can be less spillage of rows of the probe table to temporary storage if the HSJN spills to disk

Performance improvements will vary according to the workload – the more that the number of rows that is scanned from HDFS is reduced, the greater the benefit. To study this, we ran the TPC-DS workload against a 1TB Big SQL database and found that some queries improved by as much as 70%. The average performance improvement will also depend on the file format used. The graph below shows the performance improvement we observed with PARQUET and ORC tables. We observed a 15% improvement for Parquet tables but a 35% improvement for ORC tables. Note that overall performance was still better with PARQUET file format. One of the reasons ORC sees a larger improvement is simply because it takes longer to read each row using ORC, so each row that we can avoid reading, the more speedup can be seen.

Join Range Filter Predicate and Deferred Partition Elimination Improves Performance on Big SQL 4.2

Thanks to the following people for development of these work items: Qi Cheng, Tony Lai, Doug Doole, Nailah Bissoon, Xabriel Collazo Mojica, Abhayan Sundararajan, Simon Harris, Ying Cao, Bing Xin Cao.


[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm16259927

Overview Annual report Corporate social responsibility Inclusion@IBM Financing Investor Newsroom Security, privacy & trust Senior leadership Careers with IBM Website Blog Publications Automotive Banking Consumer Good Energy Government Healthcare Insurance Life Sciences Manufacturing Retail Telecommunications Travel Our strategic partners Find a partner Become a partner - Partner Plus Partner Plus log in IBM TechXChange Community LinkedIn X Instagram YouTube Subscription Center Participate in user experience research Podcasts Contact IBM Privacy Terms of use Accessibility United States — English