This is part 1 of a series on Auto-Analyze.

BigInsights Big SQL queries run faster when statistics are available, as this helps the Big SQL optimizer in choosing more efficient access plans. These statistics are gathered by Big SQL Analyze. New to BigInsights v4.2 is the Big SQL Auto-Analyze feature, which will detect tables that have never been Analyzed or have had a significant amount of data added to them, and automatically execute Analyze. Automatic detection and running of Analyze will ensure that up-to-date statistics are available to the query optimizer for creating more efficient access plans.

When a table is queried, Auto-Analyze detects if the table statistics are out of date. If they are, the table is added to the Auto-Analyze queue. Tables in the queue are converted to Analyze tasks, which are run by the Administrative Task Scheduler (ATS).

Auto-Analyze can also manage the execution of Analyze for tables that have been loaded or synchronized. Tables will be added to the Auto-Analyze queue after a LOAD or HCAT_SYNC_OBJECTS is executed, if the DEFERRED option is chosen for the and Big SQL properties.

Auto-Analyze will run an Analyze statement that analyzes all the columns and samples 10% of the data. The Analyze statement executed looks like this:


Auto-Analyze ATS tasks:

There are two types of Auto-Analyze tasks that are run by ATS. First there is one continuous task that executes periodically to check if there are tables in the queue. The default schedule for this task is to execute every 10 minutes.

The second type of task runs an Analyze statement for a table. One of these tasks will be created and run for each queued table. This type of task runs only once and is scheduled to be run immediately by ATS.

You can see the tasks that are being managed by ATS and their schedules by querying SYSTOOLS.ADMIN_TASK_LIST. The default name for the continuous task is BIGSQL_CHECK_ANALYZE. The Analyze tasks are named: Analyze <id> <schema>.<table>. The Analyze task schedules are NULL, since they will execute only once.

Auto-Analyze task list results

Auto-Analyze Status and Diagnostics:

The status of an Auto-Analyze task can be checked using the procedure SYSHADOOP.BIGSQL_AUTO_ANALYZE_STATUS. This procedure will show Auto-Analyze task STATUS, BEGIN_TIME, END_TIME and a MESSAGE of whether the Analyze was successful or ended in error. If there is an error message, it will contain a log ID, which can be used to locate more details about the Analyze error. The LOG_ENTRY function can be used with the log ID to retrieve error details from the bigsql.log. Only tasks that run an Analyze are shown.

Auto-Analyze task status results

How Auto-Analyze Works:

(1) Query a table and Auto-Analyze will automatically detect if the table needs to be Analyzed.
(2) Load a table or synchronize it with the DEFERRED option and it will be added to the queue of tables to be Analyzed.
(3) A scheduled ATS task checks periodically to see if there are queued tables that need to be Analyzed. The default schedule is every 10 minutes.
(4) For each table in the queue, a task is run once to execute Analyze.

Auto-Analyze diagram

More detailed information is available in the BigInsights v4.2 Knowledge Center in the Auto-Analyze section.
Auto-Analyze Knowledge Center

4 comments on"Faster BigInsights Big SQL queries with Auto-Analyze"

  1. […] As you know, Analyze collects statistics on tables. Without these statistics, Big SQL queries tend to perform poorly. One major challenge is Analyzing the tables that need it, when they need it. In Big SQL v4.2, Analyze is executed automatically on tables after a Load or HCAT_SYNC_OBJECTS operation. Analyze is also automatically scheduled to be executed on any table that has been queried, but does not have up-to-date statistics. You can read more about this in the Auto-Analyze article. […]

  2. […] Faster BigInsights Big SQL queries with Auto-AnalyzeBig SQL Auto-Analyze Customization […]

  3. […] within a table which is used by the Big SQL cost-based optimizer to plan efficient queries. The Auto-Analyze feature which is enabled by default since Big SQL 4.2 automatically collects statistics for all […]

  4. […] when LOAD HADOOP is used (since Big SQL 4.2). After the LOAD HADOOP command is completed, an automatic ANALYZE command is triggered. If tables are populated with INSERT however, the auto-analyze feature is […]

Join The Discussion

Your email address will not be published. Required fields are marked *