IBM Support

Big SQL Job History Analysis using Event Monitors - Hadoop Dev

Technical Blog Post


Abstract

Big SQL Job History Analysis using Event Monitors - Hadoop Dev

Body

Many times, administrators are not aware of the statements that the rest of the organization is running on the Big SQL service. When administering the Big SQL service, analysis of the statements being executed on the cluster is needed to facilitate proper workload management. Gathering of job history details such as statements being executed, execution time as well as resources utilized can give insight into the cluster’s overall health and performance. This blog will give an overview of how to collect and analyze this historical data.

Big SQL DSM and Monitoring Functions

Data Server Manager for Big SQL and DB2 monitoring functions (which have replaced DB2 snapshots) can be used to gather statistics on statements being executed or recently executed on the cluster. Big SQL automatically maintains an in-memory cache of statements run and various execution metrics. When these caches are flushed to make room for new statements (or when Big SQL is restarted), the statistics on these statements are also flushed. The metrics gathered from these monitoring utilities are sufficient to provide insight into statements being executed or recently executed on the system.

Example of using MON_GET_PKG_CACHE_STATEMENT to retrieve some valuable information on queries being executed on the system:

    SELECT        INSERT_TIMESTAMP AS CREATE_TIME,        TOTAL_CPU_TIME/NUM_EXEC_WITH_METRICS as        AVG_CPU_TIME,STMT_EXEC_TIME,                                    QUERY_COST_ESTIMATE,        ROWS_READ,                POOL_TEMP_DATA_P_READS,         TOTAL_SECTION_SORT_TIME,        EXT_TABLE_RECV_WAIT_TIME,        SUBSTR(STMT_TEXT,1,70) as STMT_TEXT        FROM TABLE(MON_GET_PKG_CACHE_STMT ( 'D', NULL, NULL, -2)) as T                            WHERE T.NUM_EXEC_WITH_METRICS <> 0 ORDER BY STMT_EXEC_TIME;      

Event Monitors

When the analysis window is larger (for example, analysis of statements executing on an active cluster for an entire day is needed), then event monitors can be used to gather metrics of queries. Since event monitors log metrics into tables rather than keeping them in memory, the metrics data is more persistent. The storage location of these tables needs to be specified. In releases prior to 5.0.3, creation of event monitors across workers was possible, however there were limitations with respect to adding and dropping nodes. In Big SQL 5.0.3 event monitors are supported ONLY on the head node.

Step 1: Creation of Event Monitors

Statement and activity event monitors need to be created to monitor the resource utilization of statements executing on the cluster. The following sample script can be used to create statistics and activity event monitors on the Big SQL head node:

    CONNECT TO BIGSQL;  -- Create the statistics event monitor  CREATE EVENT MONITOR DB2STATISTICS          FOR STATISTICS          WRITE TO TABLE                SCSTATS (TABLE SCSTATS_DB2STATISTICS IN BIGSQLCATSPACE),                WCSTATS (TABLE WCSTATS_DB2STATISTICS IN BIGSQLCATSPACE),                WLSTATS (TABLE WLSTATS_DB2STATISTICS IN BIGSQLCATSPACE),                QSTATS (TABLE QSTATS_DB2STATISTICS IN BIGSQLCATSPACE),                SCMETRICS (TABLE SCMETRICS_DB2STATISTICS IN BIGSQLCATSPACE),                HISTOGRAMBIN (TABLE HISTOGRAMBIN_DB2STATISTICS IN BIGSQLCATSPACE),                WLMETRICS (TABLE WLMETRICS_DB2STATISTICS IN BIGSQLCATSPACE),                CONTROL (TABLE CONTROL_DB2STATISTICS IN BIGSQLCATSPACE),                OSMETRICS (TABLE OSMETRICS_DB2STATISTICS IN BIGSQLCATSPACE);    -- Create the activity event monitor  CREATE EVENT MONITOR DB2ACTIVITIES          FOR ACTIVITIES          WRITE TO TABLE          ACTIVITY (TABLE ACTIVITY_DB2ACTIVITIES IN BIGSQLCATSPACE),          ACTIVITYMETRICS (TABLE ACTIVITYMETRICS_DB2ACTIVITIES IN BIGSQLCATSPACE),          ACTIVITYSTMT (TABLE ACTIVITYSTMT_DB2ACTIVITIES IN BIGSQLCATSPACE),          ACTIVITYVALS (TABLE ACTIVITYVALS_DB2ACTIVITIES IN BIGSQLCATSPACE),          CONTROL (TABLE CONTROL_DB2ACTIVITIES in BIGSQLCATSPACE);    

Step 2: Collecting data on workloads and service classes

A default user workload (SYSDEFAULTUSERWORKLOAD) and a default user service class (SYSDEFAULTUSERCLASS) is created for each Big SQL database. If no additional user defined workloads and service classes are created, all user activities will be associated with these default classes. This command can be used to alter the default classes so that monitoring data can be gathered on them. Execute this command as the bigsql user on the Big SQL head node when working with event monitors on the Big SQL head node:

    db2 connect to bigsql;  db2 “alter service class sysdefaultsubclass under sysdefaultuserclass collect activity data on coordinator member with details and values”;  db2 "alter service class sysdefaultmanagedsubclass under sysdefaultuserclass collect activity data on coordinator member with details and values";    

Step 3: Activating the event monitors

When event monitors are created with the command above, they are not automatically started. They can be started automatically by specifying the AUTOSTART keyword at the end of the statement in step 1. Keeping event monitors active for a long period of time can result in a lot of historical data generated. Therefore, it is recommended that event monitors be switched on when collection of the historical data is necessary and deactivated after the collection is complete. Activate the event monitors and verify that they are active by issuing the following commands as the bigsql user on the Big SQL head node:

    db2 connect to bigsql;  db2 “set event monitor DB2ACTIVITIES STATE 1”;  db2 “set event monitor DB2STATISTICS STATE 1”;  db2 “SELECT substr(evmonname, 1, 30) as evmon_name FROM syscat.eventmonitors  where event_mon_state(evmonname) = 1”  EVMON_NAME                                                                        --------------------------------------------------------------------------------------------------------------------------------  DB2ACTIVITIES  DB2STATISTICS  2   record(s) selected.    

Step 4: Send statistics to event monitor

Use the WLM_COLLECT_STATS stored procedure to send the statistics for all DB2 workload management objects to the active statistics event monitor. When statistics are collected and sent to the statistics event monitor, the values are reset. Execute this command as the bigsql user on the Big SQL head node for on the fly data collection:

    db2 “CALL SYSPROC.WLM_COLLECT_STATS()”     

This command is asynchronous meaning that you don’t need to wait for it to finish before moving to another task. One can also automate workload management statistics collection using the WLM_COLLECT_INT database configuration parameter. If this parameter is set to a nonzero value, workload management statistics will be collected automatically every wlm_collect_int minutes (as if you manually invoked the WLM_COLLECT_STATS procedure every wlm_collect_int minutes). Issue this command as the bigsql user on the Big SQL head node:

    db2 “update db cfg for bigsql using WLM_COLLECT_INT 60”;  Restart Big SQL    

Step 5: Run Queries

Execute the daily workload so that the metrics data can be collected for them.

Step 6: Turn off the event monitors

De-activate the event monitors and verify that none are active by issuing the following commands as the bigsql user on the Big SQL head node:

    db2 “alter service class sysdefaultsubclass under sysdefaultuserclass collect activity data none”;  db2 "alter service class sysdefaultmanagedsubclass under sysdefaultuserclass collect activity data none";  db2 “set event monitor DB2ACTIVITIES STATE 0”;  db2 “set event monitor DB2STATISTICS STATE 0”;  db2 “SELECT substr(evmonname, 1, 30) as evmon_name FROM syscat.eventmonitors where event_mon_state(evmonname) = 1”  EVMON_NAME                                                                        --------------------------------------------------------------------------------------------------------------------------------      0 record(s) selected.    

Step 7: Analysis of the data generated

Statistical data from event monitors are stored in Big SQL tables. There are a lot of metrics that are gathered when event monitors are activated. A description of all these metrics can be found in the DB2 Knowledge Center Monitor Element Reference.

Example 1: User level job history, which includes information about which user invoked which statement, start time and the duration of the statement
    - This script gives the details about up to 100 statements that were executed successfully"  connect to bigsql;  select  ACT_EXEC_TIME,  TIME_COMPLETED,  TIME_STARTED ,  ROWS_RETURNED,  SESSION_AUTH_ID,  SUBSTR(STMT_TEXT, 1, 70) AS STMT_TEXT  from ACTIVITYSTMT_DB2ACTIVITIES as ast, ACTIVITY_DB2ACTIVITIES as av  where ast.appl_id=av.appl_id  and ast.uow_id=av.uow_id  and ast.activity_id=av.activity_id  and SQLCODE >= 0  order by act_exec_time desc  fetch first 100 rows only;    Example output:  ACT_EXEC_TIME      TIME_COMPLETED   TIME_STARTED               ROWS_RETURNED        SESSION_AUTH_ID     STMT_TEXT ==============================================================================================================       8679205 2017-12-07-14.01.05.226651 2017-12-07-14.00.56.547446     100          BIGSQL     with  cross_items as (select i_item_sk ss_item_sk from item,  …    
Example 2: Statement failures, including which statements have failed and the error code returned by the failed statement
    -- This script gives the details about queries that were executed that resulted in errors, to get more details about the errors execute db2 "? SQLCODE"  connect to bigsql;  select   act_exec_time,  SQLCODE,  TIME_COMPLETED,                     TIME_STARTED,  ROWS_RETURNED,  SESSION_AUTH_ID,  SUBSTR(STMT_TEXT, 1, 70) AS STMT_TEXT  from ACTIVITYSTMT_DB2ACTIVITIES as ast, ACTIVITY_DB2ACTIVITIES as av  where ast.appl_id=av.appl_id  and ast.uow_id=av.uow_id  and ast.activity_id=av.activity_id  and SQLCODE < 0  order by act_exec_time desc  fetch first 100 rows only;    Example output:  ACT_EXEC_TIME        SQLCODE     TIME_COMPLETED             TIME_STARTED               ROWS_RETURNED        SESSION_AUTH_ID                                                                                                                  STMT_TEXT   ===============================================================================================================================================           3828252       -5104 2017-12-07-14.02.44.594650 2017-12-07-14.02.40.766398                    0 BIGSQL                                                                                                                           with inv as (select w_warehouse_name,w_warehouse_sk,i_item_sk,d_moy               …                                                                        

By issuing the following command e.g. “db2 ? sql5104” more information can be found about this error code.

Step 8: Drop Event monitors and associated tables

You maybe interested in cleaning up some of the metrics data, to do this you can drop the event monitors and associated tables and start the process again. Save the contents below to a file for example drop_evmon.sql and issue this command ‘db2 -tvf drop_evmon.sql’ as the bigsql user on the head node:

    connect to bigsql ;  set event monitor DB2STATISTICS STATE 0;  drop event monitor DB2STATISTICS;    drop table CONTROL_DB2STATISTICS;  drop table HISTOGRAMBIN_DB2STATISTICS;  drop table OSMETRICS_DB2STATISTICS;  drop table QSTATS_DB2STATISTICS;  drop table SCMETRICS_DB2STATISTICS;  drop table SCSTATS_DB2STATISTICS;  drop table WCSTATS_DB2STATISTICS ;  drop table WLMETRICS_DB2STATISTICS;  drop table WLSTATS_DB2STATISTICS;    set event monitor DB2ACTIVITIES STATE 0;  drop event monitor DB2ACTIVITIES;    drop table ACTIVITYMETRICS_DB2ACTIVITIES;  drop table ACTIVITYSTMT_DB2ACTIVITIES ;  drop table ACTIVITYVALS_DB2ACTIVITIES;  drop table ACTIVITY_DB2ACTIVITIES;  drop table CONTROL_DB2ACTIVITIES ;  


Check that all event monitors have been removed by using the db2look command with the -wlm option. There should be no event monitors in this output. Execute this command as the bigsql user on the Big SQL head node:

     db2look -d bigsql -wlm -o evmon.out    

You can use this query to check that there are no event monitor tables:

    db2 "select tabname,tbspace from syscat.tables where tbspace = 'BIGSQLCATSPACE'"    

Conclusion

This blog demonstrated how to use event monitors to gather resource metrics on historical statements. In most cases using DSM and monitoring metrics is sufficient to gather statistics on Big SQL activity. However, these metrics are stored in memory, and as new statements come into the system, these metrics can be overwritten. Event monitors store these metrics in Big SQL tables created on the Big SQL head node. As a best practice when using event monitors, activate them when statistics need to be collected and disable them afterwards. Event monitors can only give resource utilization metrics for the DB2 component of Big SQL. EXT* metrics can be used to give an indication of the HDFS metrics such as time taken to read from HDFS.

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm16259793