Lab 6: Understanding and influencing data access plans

As you may already know, query optimization significantly influences runtime performance. In this lab, you’ll learn how to use the ANALYZE TABLE command to collect statistics about your data so that the Big SQL query optimizer can make well-informed decisions when choosing between various options for data access. Collecting and maintaining accurate statistics is highly recommended when dealing with large volumes of data (but less critical for this sample lab).
 
Next, you will use the Big SQL EXPLAIN feature to examine the data access plan the query optimizer chose for a given query. Performance specialists often consult data access plans to help them tune their environments.
 
After completing this lab, you will know how to:

  • Collect meta data (statistics) about your data.
  • Collect and review data access plans for your queries.

Prior to beginning this lab, you must have created and populated several tables with data as described in an earlier lab on Querying Structured Data.
 
Allow 30 minutes to complete this lab. Please post questions or comments about this lab to the forum on Hadoop Dev at https://developer.ibm.com/answers?community=hadoop.  Special thanks to Raanon Reutlinger for his contributions to an earlier version of this lab.

6.1.          Collecting statistics with the ANALYZE TABLE command

The ANALYZE TABLE command collects statistics about your Big SQL tables.  These statistics influence query optimization, enabling the Big SQL query engine to select an efficient data access path to satisfy your query.
 
__1.          If needed, launch JSqsh and connect to your Big SQL database.
 
__2.          Collect statistics for the tables you created in an earlier lab on Querying Structured Data.   Issue each of the following commands individually, allowing the operation to complete.  Depending on your machine resources, this may take several minutes or more.

 

ANALYZE TABLE sls_sales_fact COMPUTE STATISTICS
FOR COLUMNS product_key, order_method_key
;
ANALYZE TABLE sls_product_dim COMPUTE STATISTICS
FOR COLUMNS product_key, product_number, product_line_code, product_brand_code
;
ANALYZE TABLE sls_product_lookup COMPUTE STATISTICS
FOR COLUMNS product_number, product_language
;
ANALYZE TABLE sls_order_method_dim COMPUTE STATISTICS
FOR COLUMNS order_method_key, order_method_en
;
ANALYZE TABLE sls_product_line_lookup COMPUTE STATISTICS
FOR COLUMNS product_line_code, product_line_en
;
ANALYZE TABLE sls_product_brand_lookup COMPUTE STATISTICS
FOR COLUMNS product_brand_code
;

 

2015-08-19 14_12_25-Big_SQL4.0_HoL_2015-Aug-17.htm (Last saved by user) - Microsoft Word It’s best to include FOR COLUMNS and a list of columns to the ANALYZE TABLE command.  Choose those columns found in your WHERE, ORDER BY, GROUP BY and DISTINCT clauses.

 

6.2.          Understanding your data access plan (EXPLAIN)

The EXPLAIN feature enables you to inspect the data access plan selected by the Big SQL optimizer for your query.  Such information is highly useful for performance tuning.  This exercise introduces you to EXPLAIN, a Big SQL feature that stores meta data in a set of EXPLAIN tables.
 
__1.          If necessary, launch your query execution environment and connect to your Big SQL database.
 
__2.          To create the necessary EXPLAIN tables to hold information about your query plans, call the SYSINSTALLOBJECTS procedure. In this invocation, the tables will be created only for your user account. By casting a NULL in the last parameter, a single set of EXPLAIN tables can be created in schema SYSTOOLS, which can be used for all users.

 

CALL SYSPROC.SYSINSTALLOBJECTS('EXPLAIN', 'C', CAST (NULL AS VARCHAR(128)), CAST (NULL AS VARCHAR(128)));

__3.          Authorize all Big SQL users to read data from the SYSTOOLS.EXPLAIN_INSTANCE table created by the stored procedure you just executed.

grant select on systools.explain_instance to public;

 

__4. ¬†¬†¬†¬†¬†¬†¬†¬† Capture the data access plan for a query.¬† One way to do this is by prefixing the query with the command EXPLAIN PLAN WITH SNAPSHOT FOR.¬† In this way, the query isn’t executed, but the access plan is saved in the EXPLAIN tables.¬† Run this command:

 

explain plan with snapshot for
select distinct product_key, introduction_date
from sls_product_dim;

 

Information about the data access strategy for this query is stored in the EXPLAIN tables, which you‚Äôll explore shortly. There are various tools to view the “explained” access plan.¬† In this lab, you will use a DB2 utility called db2exfmt, executed from the bash shell.
 
__5.          If necessary, open a terminal window.
 
__6.          Invoke db2profile to set up your environment.

 

. ~bigsql/sqllib/db2profile

 

__7.          Invoke db2exfmt to retrieve the plan. Supply appropriate input parameters, including the name of your Big SQL database (e.g., -d bigsql), the user ID and password under which you executed the explain plan query (e.g., -u bigsql bigsql), and an output file for the plan itself (e.g., -o query1.exp).  Adjust the example below as needed for your environment.

 

db2exfmt -d bigsql -u bigsql bigsql -o query1.exp

 

__8.          When prompted for additional information (such as an EXPLAIN timestamp), accept defaults and hit Enter.  Allow the operation to complete.
 
__9.          Investigate the contents of the query1.exp file.  For example, type

 

more query1.exp

 

Use the space bar to scroll forward through the output one page at a time, and enter b to page backward.
 
__10.       Inspect the Original Statement and Optimized Statement sections of the plan.  Sometimes, the optimizer will decide to rewrite the query in a more efficient mannerРfor example, replacing IN lists with JOINS.  In this case, the Optimized Statement show that no further optimization has been done.
 
2015-09-03 05_07_45-bigsql4-150421142453-conversion-gate02.pdf - Adobe Reader
 
__11.       Scroll to the Access Plan section. Notice the SORT operation and the total number of operations for this plan.

 

2015-09-03 05_09_01-bigsql4-150421142453-conversion-gate02.pdf - Adobe Reader

 

__12.       Next, alter the table to include an informational primary key constraint on one of the table’s columns.  From your query execution environment (e.g., JSqsh), execute the following alter command:

alter table sls_product_dim add constraint newPK primary key (product_key) not enforced;

 

This will alter the table to have a non-enforced PK constraint.
 
__13.       Now collect the plan information for the same query on the altered table:

 

explain plan with snapshot for
select distinct product_key, introduction_date
from sls_product_dim;

 

__14.       From a terminal window, invoke db2exfmt again, providing a different output file name (such as query2.exp):

 

db2exfmt -d bigsql -u bigsql bigsql -o query2.exp

 

__15.       When prompted for additional information (such as an EXPLAIN timestamp), accept defaults and hit Enter.  Allow the operation to continue.
 
2015-09-03 05_11_41-bigsql4-150421142453-conversion-gate02.pdf - Adobe Reader
 
__16.       Investigate the contents of the query2.exp file.  For example, type

 

more query2.exp

__17.       Compare the original and optimized statements.  Observe that the DISTINCT clause was removed from the optimized query.  Because of the primary key constraint that you added, the optimizer determined that the DISTINCT clause (to eliminate duplicates) was unnecessary.

 

2015-09-03 05_12_54-bigsql4-150421142453-conversion-gate02.pdf - Adobe Reader

 

__18.       Similarly, inspect the new Access Plan. Observe that no SORT operation is included and that there are fewer operations in total.

 

2015-09-03 05_13_37-bigsql4-150421142453-conversion-gate02.pdf - Adobe Reader

1 comment on"Getting Started with Big SQL 4.0 Lab 6: Understanding and Influencing Data Plans"

  1. Kumi Adachi May 31, 2017

    Hello,
    Thank you for a useful article.
    It seems users have to issue ‘grant all’ on other explain tables also, or they’ll get errors during inserting.
    Could you add the list of those tables?
    Best regards,

Join The Discussion

Your email address will not be published. Required fields are marked *