Today we released IBM MQ 9.1.2 CD, which introduces an exciting new capability which can take a lot of the complexity out of creating a scalable, fault tolerant, MQ estate. We’ll get to that, but first some background…

Your messaging system is often at the heart of your application solution. It provides the ability for your applications to communicate without the need for them to be directly connected to each other or even available simultaneously. This enables you to provide fault tolerance in your application layer, removing the need to code complicated retry logic or to handle periods when the receiving application is just too busy to handle the load.

However, there’s no point using messaging to counteract all those problems if the messaging layer itself cannot handle the scale or provide a fault tolerant, always available, service. With IBM MQ you don’t need to worry as it’s repeatedly been proven in the most extreme scenarios, by the most demanding of users across the world. And all that time, while providing the highest qualities of message service, ensuring no message data is lost, no matter what the problem.

That’s not to say that every MQ system behaves like this, you do need to apply a little bit of thought to your layout and how your applications use it to maximise your scalability and availability. And if you started small and simple with MQ in the past you may have gone for a single queue manager approach.

With this model you can take steps to make that single queue manager as available as physically possible, perhaps by using one of MQ’s high availability capabilities, such as Replicated Data Queue Managers on Linux. But you can’t avoid those times when that queue manager needs to be restarted or reconfigured, and that means an outage of your messaging service, even if just for a few seconds as the queue manager fails over. So you have to look for the dreaded quiet maintenance window to avoid impacting your application’s availability, and that’s becoming harder and harder with today’s 24×7 requirements.

So if you’re looking for continuous availability, and a way to avoid an outage to your whole messaging system you’ll need something different, an architecture that avoids a single queue manager as the route for all messages. Many of you who have built highly available MQ systems have done just that, systems that use multiple active queue managers and distribute the messaging workload across them. And it’s not enough just to worry about the messaging layer, your applications also need to remove any single points of failure, so that means multiple active instances of your critical applications too.

If you’ve a z/OS user then you’re probably aware of Queue Sharing Groups, and how they really help you on that platform by exposing centrally shared queues through those multiple queue managers. But if you’re on another platform, where such a capability doesn’t exist, having multiple queue managers means multiple copies of the same queue, each with a different set of messages.

For new applications, the multi queue manager pattern really should be the starting point and built into the applications expectations from the start. However, adapting existing applications to this pattern that have relied on a single queue manager being the target for all their messages may require extra work. In fact, it may require application changes if the logic is relying on things that are fundamentally counter to highly available solutions, for example global ordering of all message data.

So perhaps the simplest (but as you’ll see, not necessarily the best) way to achieve better availability is to create multiple ‘cloned’ environments, and to manually pin your application instances across them.

This means that when a single queue manager stops, only a proportion of the messaging stops. This is an improvement on the single queue manager but not such good news for those instances of the application that are pinned to the stopped queue manager. It can also become a problem to manage these static configurations, constantly needing to reassess how they’re configured to ensure they’re still meeting your requirements.

An even better approach is to decouple the application instances from the individual queue managers entirely, allowing them to connect to any of them.

As well as adopting the principle that messaging workload is distributed across a set of queue managers, this loosely coupled pattern also brings in the fundamental requirement that applications connect over TCP/IP. Nothing new there, but in case you’re not familiar with this, MQ has had specific capabilities to help you decouple those client connections from a specific queue manager, for example:

And you’ll see how we’re building on those in just a minute.

The Uniform Cluster pattern

For the above queue manager pattern to succeed an application needs to treat all those queue managers as equivalent. This means the application could connect to any one of them and expect to behave the same, with the same queues, topics, security configuration, etc.. Or to put it another way, these queue managers need to be uniform. This is why MQ has introduced a new phrase to its catalogue, the “Uniform Cluster”. This is really just putting a name to the above topology that many have been using for years. So why have we done that now? For a start it helps to finally have a name for it, but with the arrival of IBM MQ 9.1.2 CD you can now make those queue managers aware that they’re being used for this exact purpose. and that lets us solve some of the challenges you would have had to address yourself in the past…

How do I ensure that my multiple application instances are evenly distributed across those queue managers, with a particular need that every queue manager has at least one consuming application for each queue? If I leave it to chance with a CCDT of load balancer, I have no guarantee of success.

If I stop a queue manager for maintenance, I can easily get my applications to reconnect to a remaining queue manager (for example with auto reconnect), but how to I get some of those application back to that stopped queue manager once it’s restarted?

If I change my configuration and bring in a new queue manager, perhaps to scale up, how do I get a fair proportion of my already connected applications to move over to that queue manager?

To date, the above have typically been solved either through increased application logic, or through manual operations being performed. With the release of MQ 9.1.2 CD on the Distributed platforms (Linux, MQ Appliance, Windows, AIX) the above questions can start to be answered with the introduction of Uniform Clusters.

To do this you take a set of equivalent queue managers, linking them up in their own MQ Cluster and tell them that they are a Uniform Cluster. This gives them the permission to start to chat amongst themselves, sharing knowledge of which applications are connected where. With that information, the cluster can start to tackle the above problems for you. The queue managers in the Uniform Cluster detect an imbalance of applications across the individual queue managers and automatically move an application’s connection from one queue manager to another, re-establishing a balance. This is continually happening, solving that initial balancing and also those times when queue managers are being stopped and started due to maintenance and scaling.

MQ 9.1.2 CD for the distributed platforms introduces a number of new features that tie together to make all this possible:
• Creation of a Uniform Cluster
• The ability to identify applications by name, to help identify imbalances
• Automatic rebalancing of applications (only the ones that sit on top of MQ’s C libraries to start with – see update)
• Text based CCDTs to make it easier to configure this behaviour

We believe that this is just a start, with the expectation of enhancements and additions to come in the future, so look out for future continuous delivery releases of MQ to see how this evolves. To find out more about the 9.1.2 capability we have a number of other articles over the next few days to take you through setting this up and obviously there’s the MQ Knowledge Center. And it goes without saying that we are always keen to hear your feedback.

Next step, try it out…
Or watch it being done…
Interested in this in containers and Kubernetes?

MQ 9.1.3 Update

As I originally said, 9.1.2 was just the beginning. 9.1.3 came along a few months later to build on top of this, adding the following new capabilities:

Java support added with JMS…
See what’s going on with application status…

2 comments on"Building scalable fault tolerant systems with IBM MQ 9.1.2 CD"

  1. David Ware March 27, 2019

    With 9.1.2 they won’t, it’s still your responsibility to keep them in step. But it’s good to hear that we’re thinking along the same lines as this is definitely an acknowledged requirement and we hope to work on that in the future.

  2. Will QMs in a Unified Cluster converse among themselves to at least warn the MQ Admin and optionally automatically resolve discrepancies between their object definitions, qm.ini file settings, etc? If there was a potential of automatically resolving these discrepancies, I suppose we would need a way to identify a team leader queue manager in the group that would align all its team mate queue managers to its settings.

Join The Discussion

Your email address will not be published.