I consider myself lucky I get one comment for a blog entry – so far I have had 4 so this topic must be interesting! I have added more comments at the bottom to reflect the feedback I have had
On distributed MQ, at a hand waving level of understanding, whenever an application does a put or get to a queue, a lock is held on the queue for the duration of that request.
If an application puts or gets a persistent message out of syncpoint, then the queue lock is held across the the IO request. If the log time is 1 ms – this limits the number of requests to 1000 a second.
What tends to happen is the application issues a log write request, but the previous log write has just started, so the application has to wait for almost 2 log writes – so the throughput is now down at the 500 requests a second.
If your log write time is over 10 ms (which I have recently seen!) this means you may only achieve 50 puts plus gets a second.
If your throughput requirement is higher than this you have several options
- Improve your log IO time
- Put the puts and gets WITHIN sycnpoint
- Review if your messages need to be persistent.
- Use more queues. For for example instead of using the SYSTEM.CLUSTER.TRANSMISSION.QUEUE use multiple queues. See here
On z/OS the lock is held for a shorter time, so you may experience performance problems when migrating applications programs from z/OS to MQ Distributed or the MQ Appliance. Of course you will spot this when you do your load testing. You now know how to fix it.
Paul Harris said that if you are doing a GET out of syncpoint there is no advantage of using a persistent message. This took me a moment to see this truth in this.
There is an advantage of doing a PUT out of syncpoint. For example putting “I got here” which is created even if the transaction backed out.
I was asked to explain from a performance perspective why out of syncpoint is bad. I’ll use example numbers which are not very realistic – but make the point.
First we need to understand some terms. The transaction time is the time for the request, the throughput is how many requests per second can be done when there are many transactions running concurrently
Let the time to do a put within syncpoint be 1 ms, and the time (as seen by the application) to do a log write ( commit or out of syncpoint) as 10 milliseconds.
- A put within syncpoint followed by a commit takes 1 ms for the put, and 10 ms for the commit so it takes 11 ms.
- A put OUT of syncpoint (logicially a put within syncpoint + commit) saves a few micro seconds because there is only one MQ request. So it takes 10.9 millisecond
- How many puts out of syncpoint can we do a second? The answer is 1000 ms /10.9 ms = 90 per second. This is because the queue is locked for the duration of the put and the IO
- Multiple instances do not give increased throughput because the lock is held for 10.9 ms. You will still only get 90 a second.
- How many puts within syncpoint can we do a second. One transaction takes 1ms + 10 ms (= 11ms) , so 1000 ms/11 ms = 90 a second as above
- but we can do many transactions in parallel. The queue is locked for 1ms – so we can do 1000 puts a second. This is 10 times the throughput of a put out of syncpoint.
- If we do 10 puts within syncpoint, then commit (called batching), It takes 10 * 1ms for the put, + 10 ms for the commit = 20 ms. 1000 ms /20 ms is 50 of these a second which is 50 * 10 messages put a second for the transaction instance
- If there were 3 instances of a transaction doing 10 * put + commit – the overall throughput would still be 1000 puts a second due to the lock on the queue during the put request
- If there are putting applications and getting applications then each get or put will take the lock – so perhaps 500 puts and 500 gets a second.
These figures very simplistic and for illustration to give the general concepts. To get more information look at the performance reports – or do your own measurements.