by Nailah Bissoon
Senior Big SQL Technical Architect

Workload management is needed when resource intensive queries consume significant resources on the cluster which slow down or inhibit other queries from executing. Consult an overview of WLM for Big SQL for more information on the capabilities of workload management. The most effective way to control work is to control when it can start. Once a piece of work starts executing, it consumes CPU, I/O and memory resources. The recommended approach to workload management is to use gatekeeper concurrency thresholds to control when work can start execution and then, supplement that control with CPU prioritization techniques if applicable.
Workload management is enabled by default for Big SQL. There is one gatekeeper concurrency control which allows a certain number of heavy weight statements to execute on the system at a time. The default WLM configuration can be adapted to introduce more work classes. Adding additional work classes allows for finer grained control. The intention is to restrict the number of resource intensive queries but to not have the same level of restriction for simpler or medium classed queries.
Once these additional work classes are created, the gatekeeper concurrency controls define the level of restriction that need to be imposed. In this blog, we will focus on creating additional work classes and establishing gatekeeper concurrency controls for workloads running on the system. In a subsequent blog we will discuss how to assign CPU resources percentages for each workload. It is recommended to implement the technique described in this blog if queries are returning DB2 OOM (out of memory) errors and resource intensive queries have been identified in Stage 1 of WLM custom configuration.

Define Additional Work Classes

Additional work classes may include TRIVIAL, SIMPLE, MEDIUM and COMPLEX classes. Which queries get mapped to which work class depends on query complexity. Complexity is measured by timeron cost which is the Big SQL optimizers’ estimation of the cost of executing the query. Queries with a high timeron cost can be grouped into the complex class. Since the default WLM configuration defines light weight queries to be those queries with timeron cost of less than 150,000, this timeron range will be mapped to the TRIVIAL class and no gatekeeper concurrency controls will be assigned to this class, which is in inline with the default behavior.
The timeron ranges for the SIMPLE, MEDIUM and COMPLEX classes will depend on the work executing on the system. We want to have a decent distribution of queries across each work class. Since our initial goal was to limit the number of resource intensive queries that can execute simultaneously on the system, focus on setting the timeron range for the COMPLEX class first. The query below can help to identify the timeron cost of the most resource intensive queries.


select timestampdiff(4,TIME_COMPLETED - TIME_STARTED) AS ACT_LIFETIME, QUERY_TIMERON_COST, SORT_SHRHEAP_TOP, SUBSTR(STMT_TEXT, 1, 200) AS STMT_TEXT from ACTIVITYSTMT_DB2ACTIVITIES as ast, ACTIVITY_DB2ACTIVITIES as av where ast.appl_id=av.appl_id and ast.uow_id=av.uow_id and ast.activity_id=av.activity_id order by SORT_SHRHEAP_TOP desc fetch first 100 rows only;

ACT_LIFETIME QUERY_TIMERON_COST  SORT_SHRHEAP_TOP  STMT_TEXT

------------ ----------------------
7            25890940              9723719 -- start query 59 in stream 2 using template query4.tpl and seed 668399603
with year_total as (
 select c_customer_id customer_id
       ,c_first_name customer_first_name
       ,c_last_name customer_            
…

Since the intention is to have less queries in the COMPLEX class executing simultaneously on the system, it is okay if the number of queries in this class is less than the other classes. The following statement can be used to gather some aggregate statistics of the intended work classes:


WITH V1 AS (SELECT CASE WHEN QUERY_COST_ESTIMATE < 150000 THEN 'TRIVIAL' WHEN QUERY_COST_ESTIMATE BETWEEN 150000 AND 1000000 THEN 'SIMPLE' WHEN QUERY_COST_ESTIMATE BETWEEN 1000000 AND 6000000 THEN 'MEDIUM' WHEN QUERY_COST_ESTIMATE > 6000000  THEN 'COMPLEX' END AS TARGET_WORKCLASS, QUERY_COST_ESTIMATE, APPL_NAME, SORT_SHRHEAP_TOP, timestampdiff(4,TIME_COMPLETED - TIME_STARTED) AS LIFETIME_DURATION FROM ACTIVITY_DB2ACTIVITIES), V2 AS (SELECT TARGET_WORKCLASS, QUERY_COST_ESTIMATE, APPL_NAME, SORT_SHRHEAP_TOP, LIFETIME_DURATION FROM V1) SELECT TARGET_WORKCLASS, COUNT(*) AS COUNT, SUM(LIFETIME_DURATION) AS TOT_LIFETIME_MINS, AVG(SORT_SHRHEAP_TOP)  AS AVG_SORT_MEM4K, MAX(SORT_SHRHEAP_TOP) AS MAX_SORT_MEM4K, MAX(LIFETIME_DURATION) AS MAX_LIFETIME_MINS, INT(SUM(LIFETIME_DURATION)/COUNT(*)) AS AVG_LIFETIME_MINS FROM V2 WHERE APPL_NAME LIKE 'db2jcc_application' GROUP BY TARGET_WORKCLASS,APPL_NAME ORDER BY TARGET_WORKCLASS;

TARGET_WORKCLASS COUNT                             TOT_LIFETIME_MINS AVG_SORT_MEM4K       MAX_SORT_MEM4K       MAX_LIFETIME_MINS AVG_LIFETIME_MINS
---------------- --------------------------------- ----------------- -------------------- -------------------- ----------------- -----------------
COMPLEX                                      1296.              9469              1008840              9723719                50                 7
MEDIUM                                       2688.             22666               204519              2443022               446                 8
SIMPLE                                        864.               336               113298               541049                 8                 0
TRIVIAL                                      2110.                 0                  102                11814                 0                 0

Note that I have chosen to use a timeron cost in the range of 150000 to 1M for the SIMPLE class, timeron range from 1M to 6M for the MEDIUM class and over 6M for the complex class. These values will need to be altered on your system based on the work being executed. You can tweak the timeron ranges in the query above so that you get a good representation of queries in each of the classes. This may take a bit of a trial and error and you may not get perfection here and that is okay, the main point is to try to group the resource intensive queries into a common class. As the complexity of the classes increases, you should notice that the maximum shared sort memory of all the queries in that class (MAX_SORT_MEM4K) should also be increasing.
Here is another example of the output for this query on a different workload:


WITH V1 AS … FROM V2 WHERE APPL_NAME LIKE 'GreenTea' GROUP BY TARGET_WORKCLASS,APPL_NAME ORDER BY TARGET_WORKCLASS;

TARGET_WORKCLASS COUNT                             TOT_LIFETIME_MINS AVG_SORT_MEM4K       MAX_SORT_MEM4K       MAX_LIFETIME_MINS AVG_LIFETIME_MINS
---------------- --------------------------------- ----------------- -------------------- -------------------- ----------------- --------------- --
COMPLEX                                      4433.            194634               116783              1559199                54                 43
MEDIUM                                         13.               581                99800               175337                45                 44
SIMPLE                                         52.               797                82685               204496                45                 15
TRIVIAL                                      1638.                 0                  315               121272                 0                  0

Observe that the memory utilization for the most resource intensive query is quite small compared to the previous workload and the queries have significant representation in only two classes. In this case, the default WLM work classes maybe sufficient as we have two classes defined there.

Define Additional Gatekeeper Concurrency Controls

Once the queries are grouped into work classes, each work class needs to be assigned a gatekeeper concurrency threshold i.e. a magic number which will define how many queries can run in each work class simultaneously. On a busy cluster with lots of incoming queries, queries will be throttled which means that when the concurrency threshold has been exceeded, incoming queries will be placed on a queue until such a time that resources have been freed up on the system and the queued queries can run with less contention.
Concurrency controls are aimed more towards statements in the medium and complex classes. Enforcement is required or too many concurrent activities could over saturate the system. The gatekeeper concurrency control for each work class will depend on the maximum shared sort memory required by all queries in that class. Looking back at the output of the query above:


TARGET_WORKCLASS COUNT                             TOT_LIFETIME_MINS AVG_SORT_MEM4K       MAX_SORT_MEM4K       MAX_LIFETIME_MINS AVG_LIFETIME_MINS
---------------- --------------------------------- ----------------- -------------------- -------------------- ----------------- -----------------
COMPLEX                                      1296.              9469              1008840              9723719                50                 7
MEDIUM                                       2688.             22666               204519              2443022               446                 8
SIMPLE                                        864.               336               113298               541049                 8                 0
TRIVIAL                                      2110.                 0                  102                11814                 0                 0


Observe that the largest query in the complex class requires ~9M 4K pages, the largest query in the medium class requires ~2M 4K pages and the largest query in the simple class require ~500K 4K pages. Each class could be given an even share of the available shared sort memory on the system but that is up to the DB administrator. The available shared sort memory on the cluster can be determined by running this command.


db2 get db cfg for bigsql | grep SHEAPTHRES_SHR
Sort heap thres for shared sorts (4KB) (SHEAPTHRES_SHR) = AUTOMATIC(12914436)

This means that for the 3 work classes each class can be assigned 12914436/3 = ~4M, or if the priority of one work class should be higher than the other then more memory resources can be given to this class. If we assume that each work class should be given an equal share of the shared sort memory then the gateway concurrency control threshold value for each class can be determined using following calculations:


COMPLEX concurrency threshold= 12914436/3/9723719=1
MEDIUM concurrency threshold = 12914436/3/2443022=2
SIMPLE concurrency threshold = 12914436/3/541049=8

This is a good starting point for concurrency controls on a cluster currently returning DB2 OOM errors, where the memory usage for DB2 is close to 60%. See Stage 1 WLM Configuration – Monitoring for more details. Note the over commitment of memory in the complex class, this is because there is at least one query that requires a lot of shared sort memory. When Big SQL runs out of shared sort memory it will spill to temporary disks. Spilling is completely normal especially for more complex Big Data queries.
The initial concurrency controls are meant as a starting point. The first aim is to try to get the system under control especially if there are DB2 OOM errors when running the workload. If after deploying these initial concurrency controls, the system still results in DB2 OOM errors then look back at the memory percentage assigned to the COMPLEX class and consider other alternatives to reduce the resource requirement of this class such as using Big SQL Hadoop MQTs. If the system is stable with the introduction of these additional concurrency controls, the threshold values can be gradually increased to achieve better performance. If at any point while increasing the concurrency control values the system starts to have a very high run queue (as seen by vmstat) or returns OOM errors then you have increased the concurrency threshold values too much.

Define additional Service Classes, Work Action and Work Class Sets and Thresholds

The statements below can be used to create the additional service classes, work class and work actions sets and define the gatekeeper concurrency controls outlined in the previous sections. The timeron ranges for the work classes shown here are based on execution of the TPC-DS benchmark queries using a 1TB TPC-DS partitioned table database with ORC file format on Big SQL 5.0.2.


SET WORKLOAD TO SYSDEFAULTADMWORKLOAD;

CREATE SERVICE CLASS "DB2JCC"
DISABLE;

CREATE SERVICE CLASS "COMPLEX_DB2JCC" UNDER "DB2JCC"
COLLECT ACTIVITY DATA ON ALL DATABASE PARTITIONS WITHOUT DETAILS
COLLECT AGGREGATE ACTIVITY DATA EXTENDED
DISABLE;

CREATE SERVICE CLASS "MEDIUM_DB2JCC" UNDER "DB2JCC"
COLLECT ACTIVITY DATA ON ALL DATABASE PARTITIONS WITHOUT DETAILS
COLLECT AGGREGATE ACTIVITY DATA EXTENDED
DISABLE;

CREATE SERVICE CLASS "SIMPLE_DB2JCC" UNDER "DB2JCC"
COLLECT ACTIVITY DATA ON ALL DATABASE PARTITIONS WITHOUT DETAILS
COLLECT AGGREGATE ACTIVITY DATA EXTENDED
DISABLE;

CREATE SERVICE CLASS "TRIVIAL_DB2JCC" UNDER "DB2JCC"
COLLECT ACTIVITY DATA ON ALL DATABASE PARTITIONS WITHOUT DETAILS
COLLECT AGGREGATE ACTIVITY DATA EXTENDED
DISABLE;

---------------------------------------------
--DDL Statements for Work Class Sets

-----------------------------------------------

CREATE WORK CLASS SET "DB2JCC_WCS"
(
  WORK CLASS "TRIVIAL_COST_DB2JCC"
        WORK TYPE ALL
        FOR TIMERONCOST FROM 0 TO 150000,

  WORK CLASS "SIMPLE_COST_DB2JCC" 
        WORK TYPE ALL
        FOR TIMERONCOST FROM 150000 TO 1000000.0,

  WORK CLASS "MEDIUM_COST_DB2JCC" 
        WORK TYPE ALL 
        FOR TIMERONCOST FROM 1000000.0 TO 6000000.0,

  WORK CLASS "COMPLEX_COST_DB2JCC" 
        WORK TYPE ALL
        FOR TIMERONCOST FROM 6000000.0 TO UNBOUNDED 
);

---------------------------------------------
-- DDL Statements for Work Action Sets
-----------------------------------------------
CREATE WORK ACTION SET "DB2JCC_WAS"
 FOR SERVICE CLASS "DB2JCC"
 USING WORK CLASS SET "DB2JCC_WCS"
(
-- Direct trivial cost to the Trivial subclass and allow any nested SQL
-- statement to be re-evaluated
  WORK ACTION "MAP_TRIVIAL_DB2JCC"
        ON WORK CLASS "TRIVIAL_COST_DB2JCC"
        MAP ACTIVITY WITHOUT NESTED TO "TRIVIAL_DB2JCC",

-- Direct simple cost to the Simple subclass and allow any nested SQL
-- statement to be re-evaluated
  WORK ACTION "MAP_SIMPLE_DB2JCC" 
        ON WORK CLASS "SIMPLE_COST_DB2JCC"
        MAP ACTIVITY WITHOUT NESTED TO "SIMPLE_DB2JCC", 

-- Direct medium cost to the Medium subclass and allow any nested SQL
-- statement to be re-evaluated
  WORK ACTION "MAP_MEDIUM_DB2JCC" 
        ON WORK CLASS "MEDIUM_COST_DB2JCC"
        MAP ACTIVITY WITHOUT NESTED TO "MEDIUM_DB2JCC",  

-- Direct complex cost to the Complex subclass and allow any nested SQL
-- statement to be re-evaluated
  WORK ACTION "MAP_COMPLEX_DB2JCC" 
        ON WORK CLASS "COMPLEX_COST_DB2JCC"
        MAP ACTIVITY WITHOUT NESTED TO "COMPLEX_DB2JCC" 
) 
DISABLE;

---------------------------------------------
-- DDL Statements for Thresholds

-----------------------------------------------

-- Create disabled activity concurrency threshold for medium subclass
CREATE THRESHOLD "SIMPLE_DB2JCC_CONCURRENCY"
        FOR SERVICE CLASS "SIMPLE_DB2JCC" UNDER "DB2JCC"
        ACTIVITIES ENFORCEMENT DATABASE 
                DISABLE 
        WHEN CONCURRENTDBCOORDACTIVITIES > 8 AND QUEUEDACTIVITIES UNBOUNDED
                CONTINUE;

-- Create disabled activity concurrency threshold for complex subclass
CREATE THRESHOLD "MEDIUM_DB2JCC_CONCURRENCY"
        FOR SERVICE CLASS "MEDIUM_DB2JCC" UNDER "DB2JCC"
        ACTIVITIES ENFORCEMENT DATABASE 
                DISABLE 
        WHEN CONCURRENTDBCOORDACTIVITIES > 2 AND QUEUEDACTIVITIES UNBOUNDED
                CONTINUE;

-- Create disabled activity concurrency threshold for complex subclass
CREATE THRESHOLD "COMPLEX_DB2JCC_CONCURRENCY"
        FOR SERVICE CLASS "COMPLEX_DB2JCC" UNDER "DB2JCC"
        ACTIVITIES ENFORCEMENT DATABASE 
                DISABLE 
        WHEN CONCURRENTDBCOORDACTIVITIES > 1 AND QUEUEDACTIVITIES UNBOUNDED
                CONTINUE;

ALTER SERVICE CLASS "DB2JCC" ENABLE;
ALTER SERVICE CLASS "COMPLEX_DB2JCC" UNDER "DB2JCC" ENABLE;
ALTER SERVICE CLASS "MEDIUM_DB2JCC" UNDER "DB2JCC" ENABLE;
ALTER SERVICE CLASS "SIMPLE_DB2JCC" UNDER "DB2JCC" ENABLE;
ALTER SERVICE CLASS "TRIVIAL_DB2JCC" UNDER "DB2JCC" ENABLE;

ALTER WORK ACTION SET "DB2JCC_WAS" ENABLE;
ALTER THRESHOLD "SIMPLE_DB2JCC_CONCURRENCY" ENABLE;
ALTER THRESHOLD "MEDIUM_DB2JCC_CONCURRENCY" ENABLE;
ALTER THRESHOLD "COMPLEX_DB2JCC_CONCURRENCY" ENABLE;

--------------------------------------
 --DDL Statements for Workload
--------------------------------------

CREATE WORKLOAD "JCC_WORKLOAD" APPLNAME('db2jcc*')
SERVICE CLASS "DB2JCC"
COLLECT ACTIVITY DATA
ON ALL DATABASE PARTITIONS
WITH DETAILS
COLLECT AGGREGATE ACTIVITY DATA BASE
COLLECT ACTIVITY METRICS NONE;

SET WORKLOAD TO AUTOMATIC;

The image below shows a graphical representation of the new service classes created with the statements above.

Note, you can also stop queries that are taking a long time to execute by defining an ACTIVITYTOTALTIME threshold such as:


-- Create disabled activitytotaltime threshold for complex subclass
CREATE THRESHOLD "COMPLEX_TIMEOUT"
        FOR SERVICE CLASS "" UNDER "WLMBP_MASTER"
        ACTIVITIES ENFORCEMENT DATABASE 
                DISABLE 
        WHEN ACTIVITYTOTALTIME > 6 HOURS
                COLLECT ACTIVITY DATA ON COORDINATOR WITH DETAILS
                STOP EXECUTION;

Defining Gatekeeper Concurrency Controls for Additional Workloads

The additional work service and work classes were defined on a workload with an application name starting with ‘db2jcc’ above, if there are additional workloads executing on the system then the memory resources need to be shared among the workloads. Divide the available shared sort memory among workloads before dividing the memory resources among work classes. For example, if there were two workloads running on the system with the same memory requirement, the concurrency thresholds for each of the work classes could be divided by 2 and we would end up with 1 query running from the COMPLEX class, 1 query running from the MEDIUM class and 4 queries running concurrently from the SIMPLE class.

Conclusion

The gatekeeper concurrency control methodology described in this blog can be used to safe guard an over saturated system from resource intensive queries that utilize most of the memory resources on the system. This blog showed examples of how to lay down these additional service classes and gatekeeper controls. The values used for the concurrency threshold will depend on the workloads executing on the system and the memory available. It is recommended to “>monitor the system to ensure whether all statements execute successfully before increasing the concurrency thresholds.

Join The Discussion

Your email address will not be published. Required fields are marked *