by Nailah Bissoon
Big SQL Senior Technical Architect –
Big SQL workload management (WLM) offers the capability to monitor and control statements executing on the cluster to make efficient use of cluster resources, ensuring that the cluster is not over or under utilized. At a high level, a certain number of concurrent queries can be executed so as not to over saturate the cluster resources. When this threshold is exceeded, incoming work is queued until some of the earlier queries have completed. It is recommended to first implement the performance tips described in Big SQL v5 performance tips before customizing the WLM configuration on your cluster. These tips can help to reduce the resources required by queries running on the cluster. It is recommended that customized WLM configurations be applied to Big SQL versions that have tighter control of DFSIO memory consumption. For example, Big SQL 5.0.2 using the ORC file format is a good starting point.
YARN is a cluster wide resource manager which can be used to manage the clusters’ CPU and memory resources. Big SQL WLM can be used to manage workload resources within Big SQL. Even when YARN is enabled for Big SQL, WLM is still needed to manage Big SQL workloads. Big SQL LOAD HADOOP statements however can not be controlled by Big SQL WLM because LOAD HADOOP is a Map Reduce job that utilizes YARN resources.
Default WLM Configuration
In cases where there are a lot more resource intensive queries running on the cluster, reducing the number of concurrent heavy weight queries running on the cluster maybe beneficial to avoid out of memory errors. In other cases, where there are a lot lighter weight queries (utilizing less cluster resources) executing on the cluster increasing the number of concurrent queries may result in faster performance.
However, the reality is that workloads typically consist of a mix of heavy and light weight queries. The Big SQL Optimizer estimates how much resources are needed to execute the query in a measure called timeron cost. This cost is an estimation of the resources needed by the query to execute. In most cases the larger the timeron cost, the more resources are needed to execute the query. The Big SQL Default Workload Management Configuration allows less complex queries to run simultaneously on the cluster and more concurrent light weight queries to execute. In most cases, the default WLM configuration is sufficient to stabilize the cluster. However, the default WLM configuration can always be altered to adapt to the work executing on your cluster.
Implementing a Customized WLM configuration – Introduction of WLM stages
Keep in mind that a workload management configuration that is too complex makes a cluster difficult to monitor and manage. It is recommended to keep it simple and only add complexity when it is needed. The purpose of these WLM stages is to simplify the customization process.
In the first stage, the cluster will be monitored and data about queries and workloads executing on the cluster will be collected. The second stage is the implementation of gatekeeper concurrency controls. These concurrency controls will be used to control the number of queries running simultaneously on the cluster. In cases where queries result in DB2 out of memory errors these concurrency controls can be used to stabilize the cluster. In the final stage, more advanced WLM configuration techniques can be used to prioritize work or group of work ahead others. If new work enters the cluster, fine tuning maybe needed especially if the incoming work is fairly resource intensive.
Implementing a customized WLM configuration – Stage 1 – Monitoring and Analysis
The first stage to customize the WLM configuration is to monitor the work executing on the system. Without knowing what work is executing or how much resources are needed to execute this work, it is hard to properly configure the system. There are two types of monitoring, adhoc monitoring and historical monitoring using event monitors. The workload management monitoring strategy described here relies on Big SQL Job History Analysis using Event Monitors.
Once event monitors are created and the Big SQL jobs are executed, then some analysis can be done on the data collected to determine what customized WLM configurations should be applied. The steps to monitor and analyze the work executing on the cluster can be found in Big SQL Workload Management – Stage 1 Monitoring. It is recommended to keep monitoring the cluster to keep an eye out for new queries.
Implementing a Customized WLM configuration – Stage 2 – Establishing Gatekeeper Concurrency Controls
The most effective way to control work is to control when it can start. Once a piece of work starts executing, it not only consumes CPU and I/O resources but it also gathers many other resources such as memory. The recommended approach is to use gatekeeper concurrency thresholds to control when work can start execution and then, supplement that control with CPU prioritization/allocation techniques.
Big SQL consists of several components, the component that utilizes the most memory is the DB2 component. Another component that utilizes less but still a decent amount of the Big SQL memory resources is the DFSIO component which reads and writes data from the distributed file system. Big SQL WLM controls resources from the DB2 component, and by limiting the number of queries that are executed simultaneously from the DB2 component, the amount of resources utilized by the DFSIO component is also limited.
Most Big Data applications consist of queries that involve either sort, group-by or join operations. These operations utilize DB2 shared sort memory. When the available shared sort memory on the cluster is exceeded, pages spill to disk onto the temporary tablespace storage path. One way to prevent out of memory errors from occurring is to only run queries whose combined shared sort memory can fit in the available cluster shared sort memory. More details on how this can be done can be found in Big SQL Workload Management – Stage 2- Implementing Gatekeeper Concurrency Controls.
Implementing a customized WLM configuration – Stage 3 – Prioritizing Work Ahead of Others
Important applications may need to be given more resources compared to others to meet their SLAs. Workload management can be used to group work into workloads according to user or application name. A percentage of CPU resources can be assigned to each workload. From a memory resource allocation perspective, the same technique used in Big SQL WLM – Stage 2 can be used to allow certain workloads or groups of queries to be given a greater memory resources than others. For a use case description of how to assign memory and CPU resources to two workloads, see Big SQL Workload Management – Stage 3 – Prioritizing workload ahead of others.
In this blog we introduced the topic of workload management in IBM Db2 Big SQL. And outlined the stages required to implement a customized WLM configuration on your cluster such that one workload is given higher priority compared to others. On systems that return DB2 Out of Memory errors, implementing at least a Stage 2 WLM configuration is recommended.