Someone asked a question on one of my blog posts , and I thought this would be of interest to the wider community.
It was easier to write another blog entry on logging rather than expand the original topic.
MQ 101 logging introduction.
This is very much the concept level and is similar for both MQ distributed and MQ on z/OS.
In summary, the logging task within the queue manager writes out a lot of data to the log file. When this has completed, it takes the next data and writes this to the log file. One file IO can have data from many transaction.
I’ll walk through a scenario where we have application-x doing MQPUT to queue-x followed by MQ commit and we have 8 instances of application-x.
The system is in a steady state, and each log IO write takes 5 milliseconds. In the list below, the number at the front of the line is the time in milliseconds from the start. You may find it useful to go through the list twice
0 start the log IO for outstanding work
1 application-A does an MQPUT to Queue-A and MQ Commit. The request put on the Commit work queue, and the application sleeps
2 application-B does an MQPUT to Queue-B and MQ Commit. The request put on the Commit work queue, and the application sleeps
3 application-C does an MQPUT to Queue-C and MQ Commit. The request put on the Commit work queue, and the application sleeps
4 application-D does an MQPUT to Queue-D and MQ Commit. The request put on the Commit work queue, and the application sleeps
5 the IO to the log ends, it wakes any tasks waiting on it. It issues new log IO request for data in the log buffers – including data for application -A, B,C,D
6 application-E does an MQPUT to Queue-E and MQ Commit. The request put on the Commit work queue, and the application sleeps
7 application-F does an MQPUT to Queue-F and MQ Commit. The request put on the Commit work queue, and the application sleeps
8 application-G does an MQPUT to Queue-G and MQ Commit. The request put on the Commit work queue, and the application sleeps
9 application-H does an MQPUT to Queue-H and MQ Commit. The request put on the Commit work queue, and the application sleeps
10 the IO to the log ends, it wakes any tasks waiting on it (A,B,C,D). It issues new log IO request for data in the log buffers – eg including data for application -E,F,G,H
The duration of the commit requests for the applications are ( including wait time and IO time)
- A 9 ms
- B 8 ms
- C 7 ms
- D 6 ms
- E 9 ms
In this case the commit times for an individual application are between 1 and 2 times the log IO times.
If you have many applications running in parallel they will all get a similar response time. With more applications you get more overall throughput.
The more applications running in parallel means more data to be written in each IO – which means means that each IO take a little longer. If the average IO response time is increases from 5 to 7 ms and the response time of our transactions are
- A 13 ms
- B 12 ms
- C 11 ms
- D 10 ms
- E 13 ms
If you think the put or get of a persistent message as a put or get with an under the covers commit – they same scenario works for out of syncpoint requests.
I hope that is pretty easy to follow – so I’ll introduces some more complexities
MQ201 logging – advanced class
On the z/OS the maximum amount of data that can be written is 128 * 4096 byte pages (512KB). On distributed it is smaller than this. If your ( z/OS) application is writing a 4 MB message it will have to wait for at least 8 IO requests
- On z/OS the CSQ6LOGP OUTBUFF parameter is the number of log buffers to use. You should use the maximum of 4000 ( * 4096 bytes)
- On distributed MQ , in qm.ini there is a parameter LogBufferPages. The amount of memory allocated to buffer records for writing, specifying the size of the buffers in units of 4 KB pages. The queue manager default is 500 * 4KB pages (2048 KB).
You need enough log buffers – and as a general rule, the bigger the log buffer the better.
If you set the number of log buffers to be a low value – eg 100 pages, then no more than this number of pages can be written in one IO, and you also get applications waiting to get a log buffer as well as waiting for log IO.
The IO response time is dependent on the hardware and may depend on the load, and time of day.
Is it easy to predict what the throughput and and response time is going to be for a scenario ? No. The best way to find out is to run a workload on the hardware during peak time.
What should I monitor?
Use the tools on your platform to display IO information.
For example the iostats -x command gives information including
- Blk_wrtn/s (kB_wrtn/s, MB_wrtn/s) Amount of data written per second
- avgrq-sz Number of sectors written per IO. Often a sector is of size 512 bytes
- w_await average wait time for the IO as seen by the application request.
You can use tools tools like
- RMF to display the response time of the DASD volumes,
- SMF42 records which give you IO delays by data set – (I have a program which prints the records out – please contact me for more details),
- MQ SMF data. The Log data gives you response time, and number of pages per IO see MP1B.
Logging – the PhD class
Averages give an overall picture. But averages do not tell the whole story. (For example how many families have the average of 2.54 children per couple, and on average people in the UK have 1.9995 arms – as one person in 2000 has only one arm.).
Consider 100 tasks doing put, commit. You may get (as I described above) 50 tasks doing commits per IO. It is possible to have 80 tasks doing a commit in one IO, and 20 in the other IO. This results in a longer IO and a shorter IO. Over time I would expect these to balance out., but there are event that can cause unbalance.
What can cause an imbalance? Consider a distributed scenario where there is one application going a get out of syncpoint and 10 tasks doing put within syncpoint.
The get out of syncpoint holds a queue latch across the log IO. This IO finishes, the latch is released, and the 10 putting tasks take it in turn to get the latch, put the message and commit. The IO for the 10 putting tasks is done in one IO
The first IO had data for one get (a small amount of data). The second IO has the data from 10 puts with the message data as well so a longer IO. This gives a pattern of short IO, long IO, short IO, long IO.
On average the IO is a medium duration IO.
This is too deep for most people – I suggest you use averages – but bear in mind averages may not give the whole picture.
Can the experts predicts the throughput you will get from an IO configuration?
The answer is still no – it is best to measure in your environment, with a typical workload, using the disks you will be using and doing it st peak time. If you do it when the systems are not loaded, you may get a different disk IO response time compared to when you run it at peak time!