Of course, you want the queue manager to stay up after a CF failure to get the structure back faster, don’t you? The answer is a definite maybe…
You can control how queue managers on z/OS react to a failure of a CF structure by setting two options on the CFSTRUCT definition. These are CFCONLOS and RECAUTO. They control how the queue manager deals with loss of connectivity to a CF structure, and whether it automatically recovers the structure, respectively.
At first glance it seems obvious that you should set these attributes to take advantage of all the features in MQ to automatically tolerate and recover from failures, but what exactly should you consider when deciding how to set these attributes?
As mentioned above, the CFCONLOS attribute controls whether the queue manager tolerates loss of connectivity to a CF structure. The two possible options are to tolerate the failure, or terminate. Note that terminating will cause an outage of the entire QSG if all queue managers lose connectivity to a structure.
Normally we would suggest setting CFCONLOS(TOLERATE). This means that queue managers remain available when connectivity is lost to a structure, so private queues will still be available. Queue managers will also try to reconnect to the structure, causing the structure to be reallocated in another CF, if one is available.
Tolerating loss of connectivity seems like a good thing, so why wouldn’t you want to do it? To answer that question, consider what the impact will be to applications that use the queue manager.
When a queue manager loses connectivity to a CF structure, applications connected to the queue manager will not be able to perform any shared queue work. Also, channels will not be able to put messages destined for shared queues. So the problem is now pushed out to applications and other queue managers that are connected to the QSG.
In some cases, a better option might be to set CFCONLOS(TERMINTATE), so the queue managers that lose connectivity become unavailable. If only some queue managers lose connectivity to the CF structure, then applications and shared channels connecting into the QSG can reconnect to the remaining queue managers which still have access to shared queues.
Of course, you need to consider whether most of the work being performed on the queue managers involves shared queues, as any private queues hosted by a queue manager will be unavailable if it terminates following loss of connectivity to a CF structure.
The other attribute to consider is RECAUTO, which controls whether the queue manager recovers CF structures automatically after a failure. This is essentially the same effect as issuing the RECOVER CFSTRUCT command automatically.
As with CFCONLOS, we would normally suggest setting RECAUTO(YES) so that a queue manager in the QSG automatically recovers the structure following a failure, but are there any cases when you wouldn’t want to do this?
Recovering a CF structure involves reading the log data of (potentially) all queue managers in the QSG. This can be a time consuming process. How long depends on the time since the last structure backup was taken (you do backup your CF structures every hour or less, don’t you?), and how much data has been logged since then, but it could well take several minutes. If you want to regain access to the shared queues on a structure quickly after a failure, but without recovering any messages, then you might want to set RECAUTO(NO) and issue the RECOVER CFSTRUCT TYPE(PURGE) command to get a new, empty, copy of the structure.
A better configuration might be to separate persistent and non-persistent messages on different structures, with the structures for non-persistent messages defined as being not recoverable (defined with RECOVER(NO)). That way, the structures for non-persistent messages will be available (but empty) following a failure as soon as a CF is available to allocate them, and the structures containing persistent messages will be available after they have been recovered (automatically if you set RECAUTO(YES)).