I am indebted to my colleague Chris Baker in production of this article.
CICS TS V5.4 utilises the z/OS WLM health API (IWM4HLTH) as a means of controlling the flow of work into a CICS region. This service is used to inform z/OS WLM about the state of health of a server, in this context a CICS region. The health indicator is a number which shows, in percentage terms, how well the server is performing. It can be an integer value between 0 and 100. A value of 100 means the server is fully capable to do work without any health problems, whereas 0 means it is not able to do any work.
The z/OS WLM health API is primarily used to mitigate against problems that can be caused by work flowing into CICS just after it has completed initialization, but before it is really ready to process that work. Just because the
DFHSI1517 CONTROL IS BEING GIVEN TO CICS message has been issued does not mean that CICS is ready to go. Initialization continues after this message and some defined resources such as CICS bundles are enabled asynchronously. Hence resources such as JVM servers required for Java applications may not have completed initialization, even though the CICS TCP/IP listener is open and accepting work. Another example is pipeline scans for web services which will happen after the DFHSI1517 message is issued.
To use an analogy, on a cold frosty morning, we do not expect to drive off in our car at high speed. We need to let it warm up first before we can expect optimum performance. The z/OS WLM health API is a way of telling the outside world that the CICS region is warming up. How long a CICS region needs to warm up before it is 100% healthy will be dependent upon the types of applications being processed and the set of resources they employ. The CICS system programmer has control of this warm up process via means of a
WLMHEALTH SIT parameter
The WLMHEALTH parameter takes two values, an interval which specifies a number of seconds and a number which specifies the health adjustment percentage. An interval can be between 0 and 600 seconds (an interval of 0 is discussed later). The percentage can be between 1 and 100%. The default values are 20 seconds and 25% respectively, which means when CICS initiates its warm up processing, every 20 seconds it increases the health percentage by 25%. It will therefore take 1 minute 20 seconds for CICS to reach 100% health. The WLMHEALTH parameter can also specify a value of OFF which means that the z/OS WLM health API is not utilized by CICS. In this case CICS operates as per previous releases, which in effect means 100% health when the DFHSI1517 message is issued.
When does CICS use the z/OS WLM health API ?
Assuming WLMHEALTH is not set to OFF, then CICS will first set health to zero during CICS initialization. When the DFHSI1517 CONTROL IS BEING GIVEN TO CICS message is issued the region health will still be zero. At this point CICS will initiate its warmup processing using the interval and health adjustment values specified via the WLMHEALTH SIT parameter. Assuming the default values of 20,25 it means 20 seconds after the DFHSI1517 was issued the health value will change from 0 to 25%, after 40 seconds it will change from 25% to 50% and so on. Once the health value reaches 100% it remains at 100% until either a non-immediate shutdown of CICS is initiated (at which point CICS sets the health percentage to zero), or a CICS SPI command is issued instructing CICS to start decreasing the z/OS WLM health value. We will discuss the SPI later on.
Who takes notice of the health value?
The primary user of health values is TCP/IP, but it is also used by CICS and CICSPlex SM components. When used by TCP/IP, it affects the distribution of IP connections when Sysplex Distributor or Port Sharing are in use. It takes affect at the address space level and so and will affect all TCP/IP listeners within the CICS region, including CICS TCP/IP services, Liberty HTTP listeners, or CICS Sockets listeners.
Sysplex Distributor is configured using a value of SERVERWLM for VIPADISTRIBUTE DISTMETHOD and a health value of less than 100% for a CICS region will cause TCP/IP to reduce the weighting for DVIPA listening ports belonging to the region, and a value of 0 will remove the region from distribution (providing there are other available listeners with a health greater than 0). For port sharing, similar function is available by setting SHAREPORTWLM on the relevant PORT statement to take advantage of the z/OS WLM server-specific recommendations.
In CICS, the interface to z/OS WLM is managed by CICS monitoring domain and it will broadcast when the health value changes to those domains that are interested and who may wish to take action. In CICS TS 5.4, the CICS and CICSPlex SM components are mainly concerned on whether the health is zero or non-zero, the values between 1 and 100 are not important. In future releases of CICS, these or other components may make use of the non-zero interim values to take an action.
CICSPlex SM workload distribution has been enhanced to take account of region health in its routing algorithm. A target region with a health of zero will be less favourable than a target region with a non-zero health value, effectively removing the target region from dynamic routing under normal circumstances.
New in CICS TS V5.4 is the MQMONITOR resource which allows for enhanced configuration of MQ trigger monitors, the MQ Bridge and user written MQ consumers. Included in the improved configuration is the ability for monitors to be automatically stopped and started when the MQ connection is started or stopped. In addition the CICS-MQ component will take notice of health and not start the MQMONITORs until the z/OS WLM health value is non-zero.
The CICS-MQ component is the one CICS component that will utilize the interim non-zero health values. Whilst CICS is warming up, we want to throttle back the trigger monitors, in the same way that the z/OS WLM server specific recommendations will affect TCP/IP connections to the CICS region. For MQMONITORS whilst CICS is warming up, we throttle how many MQGET commands can be issued per second per MQMONITOR thereby controlling how many triggered tasks are started. The z/OS WLM health value is input into an algorithm to calculate how many MQGETs per second will be allowed. Once health reaches 100%, the throttle is removed. The throttle is only applied to MQMONITORs and not user applications using the MQI nor to trigger monitors not using the new MQMONITOR resource.
Using the EXEC CICS SET WLMHEALTH spi
The health of a CICS region can also be manipulated not just at CICS startup and shutdown but also via use of EXEC CICS SET WLMHEALTH spi commands. This command is also available via Explorer, CPSM WUI and CEMT. The SPI not only allows the interval and percentage values to be altered but also allows WLMHEALTH to be open or closed. When set to OPEN, it kicks off the warmup period of incrementing the health percentage every interval. During this time the state will be OPENINGuntil it reaches a health value of 100% when the state changes to OPEN. When set to CLOSED, CICS initiates a cooldown period whereby the health percentage is decremented by the specified amount every interval. During this period the state is CLOSING until health of 0 is reached when the state changes to closed. There is also the ability to IMMCLOSE which immediately sets the health to zero and the state to closed.
The SPI can be used in conjunction with specifying a SIT parameter of WLMHEALTH=(0,nn). By specifying an interval of zero, it means CICS comes up with health zero and it remains at health zero until an SPI command is issued to change the interval to non zero and to specify OPEN. At this point the warmup period is started incrementing the percentage by nn% every interval. This allows greater user control of the warmup period. For example you may wish to check on the availability of a set of resources required for applications before initiating the warmup process or you can immediately set health to 100% once checks are complete.
Setting WLMHEALTH closed allows a user to have a controlled cool down period if required rather than health being set to zero immediately upon CICS shutdown. For example in the cooldown period as the health reduces, the MQMONITORs are throttled back more and more until health reaches zero when the MQMONITORs are then stopped.
Other related enhancements
We have mentioned already that when CICS shutdown is initiated, z/OS WLM health is set to 0. This will make the region a less favorable target for connections and also CICS components notified of health status will react, such as the CICS-MQ adapter stopping its monitors. Added to that, CICS shutdown processing has been changed to call sockets domain earlier to quiesce or force close the IP connections. Finally the quiesce close of TCPIPSERVICEs has been enhanced. Up till now a SET TCPIPSERVICE CLOSE operation would not complete successfully if there were persistent HTTP connections in use. Now any persistent connections will be closed after 30 seconds or after the socketclose interval, whichever is soonest. Together these enhancements will improve CICS shutdown processing by utilizing improved ways of stopping the flow of work into CICS.
Building on CICS TS 5.3 enhancements
In CICS TS 5.3 we provided performance tuning for HTTP connections via the SOTUNING SIT parameter. This is to protect CICS from unconstrained resource demand, to improve processing when CICS reaches a maxtask condition thereby giving it a chance to recover. When CICS is constrained, new connection requests are not accepted and those pending requests will queue outside of CICS in the TCP/IP backlog queue. This backlog queue will increase and this in turn will feed into the z/OS WLM server specific recommendations. Now in CICS TS 5.4 via use of the z/OS WLM health API we can feed into the z/OS WLM recommendations information about CICS regions that are still ‘warming up’. This will help prevent these regions going maxtask because they are getting too much work before they are really ready to go.
Additionally in CICS TS 5.4 we have augmented the throttling of work at maxtask for HTTP connections and applied a similar technique when the new asynchronous API is used. If a region becomes overloaded, CICS will automatically start regulating workflow to prevent too many child tasks being created. Parent tasks issuing a
RUN TRANSID command will be suspended and put in a queue, and resumed when workload levels in the region drop.
Possible future enhancements
In the future it may be appropriate for CICS to also initiate changes in its health status during maxtask conditions to complement the existing SOTUNING functionality. As well as TCPIP, CICSPlex SM workload management and CICS components could take action to better facilitate recovery from a maxtask condition.