IBM Support

Faster BigInsights Big SQL queries with Auto-Analyze - Hadoop Dev

Technical Blog Post


Abstract

Faster BigInsights Big SQL queries with Auto-Analyze - Hadoop Dev

Body

This is part 1 of a series on Auto-Analyze.

BigInsights Big SQL queries run faster when statistics are available, as this helps the Big SQL optimizer in choosing more efficient access plans. These statistics are gathered by Big SQL Analyze. New to BigInsights v4.2 is the Big SQL Auto-Analyze feature, which will detect tables that have never been Analyzed or have had a significant amount of data added to them, and automatically execute Analyze. Automatic detection and running of Analyze will ensure that up-to-date statistics are available to the query optimizer for creating more efficient access plans.

When a table is queried, Auto-Analyze detects if the table statistics are out of date. If they are, the table is added to the Auto-Analyze queue. Tables in the queue are converted to Analyze tasks, which are run by the Administrative Task Scheduler (ATS).

Auto-Analyze can also manage the execution of Analyze for tables that have been loaded or synchronized. Tables will be added to the Auto-Analyze queue after a LOAD or HCAT_SYNC_OBJECTS is executed, if the DEFERRED option is chosen for the biginsights.stats.auto.analyze.post.load and biginsights.stats.auto.analyze.post.syncobj Big SQL properties.

Auto-Analyze will run an Analyze statement that analyzes all the columns and samples 10% of the data. The Analyze statement executed looks like this:

ANALYZE TABLE <schema>.<table> COMPUTE STATISTICS FOR ALL COLUMNS TABLESAMPLE SYSTEM(10)

Auto-Analyze ATS tasks:

There are two types of Auto-Analyze tasks that are run by ATS. First there is one continuous task that executes periodically to check if there are tables in the queue. The default schedule for this task is to execute every 10 minutes.

The second type of task runs an Analyze statement for a table. One of these tasks will be created and run for each queued table. This type of task runs only once and is scheduled to be run immediately by ATS.

You can see the tasks that are being managed by ATS and their schedules by querying SYSTOOLS.ADMIN_TASK_LIST. The default name for the continuous task is BIGSQL_CHECK_ANALYZE. The Analyze tasks are named: Analyze <id> <schema>.<table>. The Analyze task schedules are NULL, since they will execute only once.

Auto-Analyze task list results

Auto-Analyze Status and Diagnostics:

The status of an Auto-Analyze task can be checked using the procedure SYSHADOOP.BIGSQL_AUTO_ANALYZE_STATUS. This procedure will show Auto-Analyze task STATUS, BEGIN_TIME, END_TIME and a MESSAGE of whether the Analyze was successful or ended in error. If there is an error message, it will contain a log ID, which can be used to locate more details about the Analyze error. The LOG_ENTRY function can be used with the log ID to retrieve error details from the bigsql.log. Only tasks that run an Analyze are shown.

Auto-Analyze task status results

How Auto-Analyze Works:

(1) Query a table and Auto-Analyze will automatically detect if the table needs to be Analyzed.
(2) Load a table or synchronize it with the DEFERRED option and it will be added to the queue of tables to be Analyzed.
(3) A scheduled ATS task checks periodically to see if there are queued tables that need to be Analyzed. The default schedule is every 10 minutes.
(4) For each table in the queue, a task is run once to execute Analyze.

Auto-Analyze diagram

More detailed information is available in the BigInsights v4.2 Knowledge Center in the Auto-Analyze section.
Auto-Analyze Knowledge Center

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm16259947