Application rebalancing across a uniform cluster

So far in this series, we’ve gone through the basics of uniform clusters and seen how applications are rebalanced across these with a demo.

This article is part of the “Uniform clustering in IBM MQ” series.

Now you need to understand what is actually involved in the rebalancing process, and how you can investigate unexpected behaviors. To do this, you need to understand what attributes the uniform cluster monitors and how their values influence balancing.

Interrogating your uniform cluster

You can interrogate the status of the different components of your uniform cluster in different layers, which is useful for debugging if you find that the app instances hosted in the uniform cluster aren’t balancing as expected. Let’s look at these layers in detail using the examples found in the IBM MQ docs, to understand the attributes involved in rebalancing.

The following command allows us to query the status of all app name groups in a uniform cluster:

DIS APSTATUS(*) type(APPL)

Typical output from this command looks like this:

alt

Here, we can see the attributes that the uniform cluster keeps track of related to a single app name group. Let’s break down the key balancing attributes, and put them into context of this example:

APPLNAME is the app name group for which the information is being displayed and each app name group connected to the uniform cluster has a separate listing. We can see here that MYAPP is the only value of APPLNAME shown, which indicates that it is the only app name group connected to our uniform cluster.
CLUSTER is the name of the uniform cluster the app name group is connected to, which is UNIDEMO in our case.
COUNT is the number of instances of the app name group connected to the uniform cluster. In our example, there are 8 instances of MYAPP connected to UNIDEMO.
MOVCOUNT is the number of connected app instances that are eligible for rebalancing. In our example, all app instances are eligible for rebalancing since MOVCOUNT = COUNT.
BALANCED indicates whether the instances of MYAPP are balanced across the uniform cluster. In our example, we can see that they are not balanced.

We can then look at the status of all queue managers in a uniform cluster using this command:

DIS APSTATUS(*) type(QMGR)

Typical output of this command looks like this:

alt

This example output shows the status of each queue manager in the uniform cluster, in relation to each app name group that is connected to it.

QMNAME is the name of the queue manager. Here, we have 3 queue managers (UNID001, UNID002, UNID003).
APPLNAME is the app name group connected to that queue manager. In our example, the only app name group connected to the uniform cluster is MYAPP, so we only see this in the listed output.
COUNT is the number of instances of the app name group connected to the queue manager. In our example, we can see that UNID001 has six app instances of MYAPP, and UNID002 and UNID003 each have one instance.
MOVCOUNT is the number of app instances that are eligible for rebalancing in each queue manager. In our example, all app instances are eligible for rebalancing because MOVCOUNT = COUNT for all queue managers.
BALSTATE shows the relative balance of the app instance indicated by APPLNAME in each queue manager and can have values HIGH, LOW, or OK. Because UNID001 has more instances of app1 connected to it than UNID002 and UNID003, its BALSTATE is HIGH, while the BALSTATE of the other 2 queue managers is LOW.

Now we can look at the status of each app instance with within a particular queue manager using this command:

DIS APSTATUS(*) type(LOCAL)

Typical output would look like this:

alt

In this example, we’ve listed the information for the app instances connected to UNID001, after the uniform cluster became balanced. Looking at the key balancing attributes:

MOVABLE shows whether the app instance is eligible for reconnection across the uniform cluster. In our example, we can see all 3 app instances in UNID001 can be reconnected.
IMMREASN states the reason why the app instance is not reconnectable. In our example, because all the displayed app instances are reconnectable, IMMREASN is NONE.

Rebalancing Scenarios

The queue managers in the uniform cluster periodically share and synchronize state to ensure the balance of application instances is consistent cluster-wide. As mentioned above, the BALSTATE attribute indicates the relative balance of a queue manager with regards to a particular app name group. The number of application instances that are available to move to achieve a balance is indicated by the MOVCOUNT attribute value for that app name group. Ideally, the value of MOVCOUNT would be equal to the value of COUNT to allow unconstrained balancing, but this isn’t always the case.

The following animations illustrate some typical rebalancing scenarios to help visualize how MOVCOUNT might influence balancing within a uniform cluster.

This animation shows how rebalancing occurs when all connected app instances are eligible for rebalancing. In this example, QM1 has 6 instances of app1 (causing BALSTATE : HIGH), and QM2 has 2 (causing BALSTATE : LOW). All of these app instances can be rebalanced, so QM2 requests 2 app instances of app1 from QM1 so that both queue managers have an equal number connected to them. This is an ideal scenario since all app instances could be rebalanced so there was no impediment to the uniform cluster becoming balanced.

alt

The following animation shows a rebalancing scenario where there are some app instances that can’t be rebalanced. Here, QM1 again has 6 instances of app1 connected, but only 4 of these are movable. QM2 has only 2 instances of app1, so again requests QM1 for 2 more. 2 of the 4 movable app instances from QM1 are reconnected to QM2 and the uniform cluster is balanced. Admin intervention might be useful to determine why there are unmovable app instances, although it is not necessary since the uniform cluster was still balanced.

alt

Here we see a scenario where the number of movable app instances is exhausted. QM1 has 15 instances of app1, of which only 3 are movable. QM2 has 5 instances of app1 so requests 5 from QM1 to become balanced. Since QM1 does not have 5 movable instances of app1 connected to it, it is only able to reconnect 3 of these to QM2. The uniform cluster is still left unbalanced. This situation would require intervention to understand why so many of the app instances of app1 connected to QM1 are not eligible for reconnection.

alt

The following animation illustrates the scenario where there are no movable app instances. Here, QM2’s BALSTATE is LOW, so it requests 5 instances of app1 from QM1. QM1 has 15 instances of app1, but none of these are eligible for rebalancing. This means that app instances can’t be reconnected to QM2, and the uniform cluster remains unbalanced. This is an extreme scenario and intervention would be necessary to ensure the uniform cluster is able to behave optimally.

alt

So what does intervention look like?

When we encounter scenarios where we have non-reconnectable application instances, we need to intervene to work out the causes for this. This can be done using the commands discussed at the start of this article, and the process for doing so is as follows:

Look at each app name group and see if any are unbalanced, using DIS APSTATUS(*) type(APPL) and checking if COUNT equals MOVCOUNT.
If there are imbalances, look at the balance of app instances in each queue manager using DIS APSTATUS(*) type(QMGR) and check if there are any queue managers where COUNT does not equal MOVCOUNT.
For these queue managers, look at the individual app instance they are hosting using DIS APSTATUS(*) type(LOCAL) and identify the reason they are unmovable (given by the value of IMMREASN).
You can then look up the values of IMMREASN in the table at the bottom of this MQ doc to identify the action you need to take to address the issue.

Summary and Next Steps

In this article, we've looked at techniques you can employ to understand the state of the queue managers in a uniform cluster and their connected applications. We’ve also been through typical rebalancing scenarios, and the steps you can take to address non-reconnectable applications.

To prevent your applications from becoming non-reconnectable in the first place, ensure they fulfil the essential criteria found in the table at the end of this tutorial.