The Streams 4.2 console includes increased monitoring capabilities for ZooKeeper ensembles. The console now displays information about the ZooKeeper ensemble in the domain and allows the user to monitor the health and metrics of each ZooKeeper node in the ensemble.
The ZooKeeper Health Analysis dialog displays metrics and health statistics for all nodes in a ZooKeeper ensemble.
How do I display the ZooKeeper metrics?
To display the ZooKeeper Health Analysis dialog, go to the Management dashboard, click on the left most button in the upper button bar to display the tree widget that displays all of the objects in the domain. At the bottom of this tree is a new menu item called ZooKeeper Ensemble. Once the menu for the ZooKeeper ensemble is displayed click the link Check ZooKeeper Health.
The other menu item, Reset ZooKeeper Statistics, can be selected at any time to clear existing statistics and restart their collection.
After a few seconds (if the number of nodes in the ZooKeeper ensemble is large, it will take longer), the dialog appears. It displays a tab for each node in the ensemble.
What do the statistics mean?
In the ZooKeeper Metrics section, the overall status of the node is displayed, along with the Minimum, Average and Maximum latency values. These values represent the number of milliseconds this node took to respond to requests. If the latency numbers are high, it can be a result of slow disk performance, a result of devices being virtualized or being shared (not dedicated) or slow hardware. All can cause zookeeper disconnects as a result and system instability.
The Request Outstanding section is helpful in determining if requests are backing up on this node.
In the Metrics for ZooKeeper operation tests section the time to perform each ZooKeeper operation is displayed based on running a test that collects the statistical information. Count indicates the number of operations for that test, and all the other statistical categories indicate the amount of time to perform the operation.
The Create and Delete operation counts should be identical because the test creates and then deletes (cleans up) after itself.
Why are some of the icons yellow, and some of the values in red?
Default threshold values have been configured for the metrics. If any value is outside these threshold ranges the yellow warning icon is added and the value is colored red.
Why are ZooKeeper Health Statistics important?
If processing in the Streams domain seems sluggish or PE’s appear unhealthy it is a good idea to check on the nodes of the ZooKeeper ensemble to see if there are latency issues.
Also, if Streams services go up and down unpredictably it may be related to a ZooKeeper issue and these health and metric tests should be run.
Important: The health and statistics displayed are cumulative since the time the domain was started. If there was a problem with ZooKeeper in the past these statistics will skew the results displayed. To clear old statistics and restart monitoring, select the ‘Reset ZooKeeper Statistics’ from the ZooKeeper Ensemble menu item in the Streams tree.
Is the ZooKeeper Ensemble Information from the Streams 4.1 Management dashboard still available?
Yes. This card still appears in the Management dashboard and displays similar information to the ZooKeeper Metrics section in the new dialog. However, this card does not display the ZooKeeper operation metrics.
For information about setting up ZooKeeper, see the IBM Knowledge Center.