From a business perspective an application may see a slow down, and one cause is XMITQs are filling up. How do you detect this – and what can you pro-actively do?
You can monitor two types of data
MQ can put a messages to an events queue when the message depth is over a certain value, or a message was on a queue for a long time.
- Low overhead – not additional costs until the event is triggered
- You know when the incident happens
- You do not see any trending information – like over the last month the average queue depth increases
- Some events you have to reset once it has fired. For example once you get queue high event, you need a queue low event to reset the trigger – this avoids getting events as the current depth of the queue hovers around the queue depth high value.
Regularly display information about the queue
- You can see trends
- This requires regular checks which uses CPU.
- The more queues you check the higher the costs
- The more frequently you check the higher the costs – but you get a more accurate picture
- You can miss problems. For example at 0100 the queue depth is 0. at 0101 the queue depth is 10000 at 0104 the queue depth is zero. At 0105 the monitoring sees the queue depth is zero and reports all is well!
What MQ events are useful?
- Queue depth high tells you when a queue gets to a certain depth. If the normal depth of the queue is 5 then set the queue high depth to be at least 2 * batch size.
- Service interval high. When a message is read and the time between the put and the get is longer than the service interval. Note: this is detection once the message has been got. If the message was stuck on a queue for 4 days because the channel was down, you get the event when the message is got. This is like the burglar alarm going off when the intruders leave the building
What attributes are interesting
- Display current depth
- Display qtatus and check the age of the oldest message (MSGAGE) for a cluster transmission queue messages for a destination which is down will be old – so this may not reliably tell you if there is a problem
- DIS CHS and look at XQTIME
- DIS CHS and make sure the channel is STATUS(RUNNING)
What can you do?
At a customer, when an MQ event occurred, the systems monitoring invoked a bash shell script passing in the queue manager name ($1) and queue name($2).
The script issued
# The logfile is /tmp/MQyymmdd.log logFile="/tmp/MQ""$(date +%y%m%d)"".log" echo "DIS CHS(*) NETTIME BATCHSZ XBATCHSZ BYTSSENT MSGS STATUS XQTIME where(xmitq,eq,$2)"| runmqsc $1 >> $logFile sleep 1 echo "DIS CHS(*) NETTIME BATCHSZ XBATCHSZ BYTSSENT MSGS STATUS XQTIME where(xmitq,eq,$2)"| runmqsc $1 >> $logFile
This issues the command with a 1 second gap between the display commands. This allows you to see
- If the channel(s) was running
- The rate of messages processed
- if the queue depth is changing. This may show messages are arriving faster than they the channel is processing them
- How many bytes were sent – to allow you to calculate the bytes send per second, the channel data rate. Compare this with the “normal” value you collected earlier .
- The nettime – to show if there is a problem with the network or at the remote end