Answer by 5VDT_Mohamed_Najih (111) | Jun 30, 2015 at 06:45 PM
The WebSphere MQ Explorer FTE/MFT plug-in and the fteShowAgentDetails and fteListAgents commands may return the agent's status as "unreachable" when the agent is running fine and reachable, i.e. the command ftePingAgent returns successfully. This inaccuracy can occur in two cases:
Case 1: The agent status could not be determined from the information held on the coordination queue manager. The status is therefore inaccurate, as these mechanisms did not attempt to communicate with the agent, and so were unable to determine if the agent could be contacted. So instead of reporting the status as "Unknown" or "Undefined", they inaccurately report it as "unreachable". This is a known issue that was revealed via APAR IT05697:
http://www-01.ibm.com/support/docview.wss?uid=swg1IT05697
This APAR's fix was included in the following Fixpacks of FTE/FMT: 7.0.4.5, 7.5.0.6 and, 8.0.0.2.
If you are running an FTE/FMT version/fixpack earlier than the above, then this APAR applies.
Case 2: The agent is running, managed file transfers are flowing and the ftePingAgent command is successful. The fteListAgents command shows the agent as UNREACHABLE.
The WMQFTE agent will, at periodic intervals, publish it's status to the coordination queue manager. The frequency at which it will publish its status is controlled by two agent properties: 'agentStatusPublishRateLimit': The maximum rate in seconds that the agent republishes its status because of a change in file transfer status. 'agentStatusPublishRateMin': The minimum rate in seconds that the agent publishes its status. This value must be greater than or equal to the value of the agentStatusPublishRateLimit property.
Case 2 is documented in technote, reference #: 1616381
http://www-01.ibm.com/support/docview.wss?uid=swg21616381
Using the default settings, out-of-sync clocks between the agent system and the coordination queue manager will cause this issue, if the difference between the times is greater than 303 seconds. WMQFTE agent status messages are deemed stale, if the message was sent more than agentStatusPublishRateMin + agentStatusJitterTolerance seconds ago. An agent with a stale status message is reported as unreachable by the fteListAgents command. By default, agentStatusJitterTolerance is 3000 ms, and agentStatusPublishRateMin is 300 seconds. If the time difference between the machines plus the effective publish rate is greater than the agentStatusPublishRateMin + agentStatusJitterTolerance, the time difference is the cause of the "unreachable" agent status. You have two options to resolve this issue:
Correct the time setting differences that exist between the agent host machine and the machine hosting the coordination queue manager, so that they are in sync.
Increase the value of agentStatusJitterTolerance to account for the time difference. When using fteListAgents, the value of agentStatusJitterTolerance is determined from the coordination.properties configuration file in the WMQFTE configuration directory. The property should, therefore, be set in the coordination.properties file of the WMQFTE installation on which the fteListAgents is being run.
Answer by Jason Simmons (1246) | Jul 02, 2015 at 10:11 AM
In addition to the answer above, there are times when the only option is to manually clear things up so that the correct agent status can be received. To do this stop the agent, clean it using the following commands and manually remove the retained publication for the agent on the coordination QMgr.
Stop the agent: fteStopAgent -i AGENT_NAME
Clean the agent:
fteCleanAgent -ims AGENT_NAME
fteCleanAgent -trs AGENT_NAME
The above will remove all invalid messages and discard any pending and in-progress transfers.
Clear the retained publication for agent named AGENT_NAME on the coordination QMgr. To do this, it is easiest to use the MQ Explorer. Connecting the the remote zLInux QMgr (the coordination QMgr) using MQ Explorer, go to the Topics folder, select SYSTEM.FTE from the listing, right click on it and select Status. In the new Status window, expand SYSTEM.FTE, expand Agents and find AGENT_NAME in the list. Right click on AGENT_NAME and select Clear Local Retained Publication. Hit Yes and then OK.
Now restart the agent named AGENT_NAME:
fteStartAgent AGENT_NAME
Check that the published status for the agent by using:
fteListAgents
You can repeat the above steps for other agents that are also listed as UNREACHABLE.
Answer by Beverly Brown (1317) | Jan 12, 2017 at 02:21 PM
I have seen another reason for getting the "unreachable" status or some other unexpected status, e.g. "stopped" when the agent is actually running.
The streamed publication queue SYSTEM.FTE was not being processed on the coordination queue manager (see Configuring the coordination queue manager). Messages were built up on the queue, so it had a non-zero CURDEPTH. The queue was not open for input (IPPROCS was zero).
The reason for the queue not being open was that there was not an entry in the SYSTEM.HIERARCHY.STATE queue for SYSTEM.FTE. There were only entries for SYSTEM.BROKER.DEFAULT.STREAM and BROKER.ADMIN.STREAM.
We were unsure of why SYSTEM.FTE did not have an entry in the queue, but removing it from and adding it back into NAMELIST(SYSTEM.QPUBSUB.QUEUE.NAMELIST) caused SYSTEM.FTE to be added to SYSTEM.HIERARCHY.STATE.
See related document https://developer.ibm.com/answers/questions/175644/why-are-messages-building-up-in-systemfte-queue-af/