In my last blog post, I introduced Netcool scope-based event grouping, and explained how to get off to a fast start with it. In this blog post I will describe how to take event grouping to the next level, by leveraging Netcool Operations Insights Event Analytics¬†on top of scope-based event grouping¬†groups to create super groups.
The story so far…
Scope-based event grouping works on the theory that, if you have a number of events that come from the “same place” at the “same time”, then they are probably related to the same problem, and so we should group them together. The notion of “same place” might be a geographic one, such as cell site ID, or it might be logical grouping, such as from a common node. Scope-based grouping yields tremendously significant results, and most customers enjoy upwards of 70% event reduction and correlation. This results in quicker problem triage, reduced ticket counts, and ultimately significant savings to operations.
Which grouping mechanism should I use?
With Netcool Operations Insights, another event grouping capability is available: Event Analytics based event grouping (also known as “Related Events”). Event Analytics based event grouping looks at the event history and tries to determine which events historically occur together, and then uses these insights to group related events together, if they should occur again in future. With both event grouping options available, customers have understandably wondered which one should they use.
Answer: use both
Recent work done by IBM with customers have delivered enhancements in Netcool that support an approach to event grouping that leverages both mechanisms at once. First, tribal domain knowledge is applied to the event estate to define what scope makes sense in the environment and, based on this scope, scope-based event grouping is implemented. This requires a certain investment of effort; however, as my last blog describes, the return on investment is usually large, for relatively little effort. The system is then allowed time to run, so that event history involving scope-based grouping synthetic parent events can be captured in the
REPORTER event history database. Next, Netcool Operations Insights Event Analytics is run against the scope-based grouping synthetic parent events.
How to configure Event Analytics to leverage scope-based grouping
Use the following configuration steps in an Event Analytics configuration to apply Event Analytics to the synthetic parent events generated by scope-based event grouping.
Set the following filter in the General tab:
AlertGroup = 'ScopeIDParent'
Relate Events tab
Set the following relationship profile in the Related Events tab:
- Relationship profile:
Set the following event identify in the Advanced tab:
- Event identity: set to:
How the settings work
These settings cause Event Analytics to only look at the scope-based event grouping synthetic parent events, by virtue of the filter. Event Analytics looks at the event history to consider which of the scope-based event groups always and only alarm together
Next, using a Relationship Profile of
STRONG will ensure that anything found is more or less conclusively correct. There are two features of a
STRONG Relationship Profile setting that mean it finds only conclusively correct groupings. Consider a scenario where the analytics has found a group of events that always seem to occur together. Before the analytics will confirm it as a valid grouping, it must pass the following conditions:
- Each member of the group must be present every time the grouping is seen in the event history;
- None of the members ever occur in isolation of each other; only ever together with the others.
STRONG Relationship Profile will ensure only confident groupings, which is why it is recommended as a first step.
Finally, because the scope-based grouping synthetic parent events always have a unique and different
Identifier field, the
Identifier can not be used as the Event Identity for the analysis. This is because the analytics will then not recognise all the instances of the synthetic parent for a given
ScopeID as being the same event, which is what we need. Hence, the Event Identity is changed to be
SCOPEID instead. This ensures that each synthetic parent event for the same
ScopeID is recognised as a different instance of the same event, which is what we want.
The term “super groups” has been coined to describe where groups of events are themselves grouped together under a single synthetic parent event. After a Event Analytics grouping has been deployed, Netcool Operations Insights will automatically group together the relevant scope-based event grouping synthetic parents under a top-level synthetic parent event, should they occur again in the future.
Although the two grouping mechanisms both use the
ParentIdentifier field to link parent events to child events, the two grouping mechanisms will not clash, if configured in the manner described. This is because the
ParentIdentifier field of the scope-based event grouping synthetic parent event is created with a blank
ParentIdentifier field, hence it is not a problem for the Event Analytics mechanism to set this field value, and hence link it to a higher level synthetic parent event.
How the grouping automations now service super groups
The scope-based event grouping automation code base has been extended in Netcool/OMNIbus 8.1 Fix Pack 17 to support this notion of “super groups”. It has been extended to do the following additional tasks:
- The event with the highest
Severityin the entire sub-tree of the super parent event will automatically propagate up to the top level super parent event.
- The highest
ImpactWeightof the entire sub-tree of the super parent event will automatically propagate up to the top level super parent event.
CustomTextof the priority child event of the entire sub-tree of the super parent event will automatically propagate up to the top level super parent event.
- The trouble ticket number will automatically propagate down to the entire sub-tree of unticketed events, under the super parent event.
OwnerGIDfield values of the super parent event will automatically propagate down to the entire sub-tree of unticketed events, under the super parent event.
- For each
ScopeIDParentevent, it will check to see if it is itself a child event. It is is, it will gather the event information under it and roll those up into journal entries in the “super parent” event, using the same mechanism that is used to roll its child event detail into its own journals. This enables a trouble ticket to be cut from the top-level super parent event, and it will already contain all the underlying child event information. Two new properties have been introduced to enable this feature:
SEGJournalToSuperParent(default: 0) to enable journalling to super-parents, and
SEGMaxSuperParentJournals(default: 100) to set the maximum number of child events to journal to the super-parent.
What are the benefits?
As discussed in the previous blog, it is advantageous to set your
ScopeID to be a big enough “net” to catch all events that relate to a single incident. Of course there is a limit where, if you set the
ScopeID to be too broadly encompassing, the scope-based event grouping mechanism will start to gather together events that are unrelated. Scope-based event grouping therefore, however tremendously valuable in terms of event and ticket reduction, has its limitations.
Further event and ticket reduction
Applying Event Analytics based event grouping to the
ScopeIDParent events enables the automatic creation of super groups. Not only does this further compress the events on the Event Viewer making it easier for operators to work from, it also potentially reduces the number of tickets being opened. Instead of opening a ticket for each scope-based event grouping group, a single ticket can instead be opened for the super group, since all the underlying events will all relate to the same incident.
Exposing of unknown relationships
Applying Event Analytics to the
ScopeIDParent events provides insights into how
ScopeID groupings alarm in relation to each other. This can expose previously unknown relationships between the event groups. Many customers do not have a reliable CMDB or stored representation of event dependencies, and so this information is not typically available for correlation purposes. Event Analytics will discover these patterns of relationship, based on the event history alone.
EXAMPLE: Imagine you are setting ScopeID across the board, based on cell site location. However, there are three smaller cell sites that are very close to each other and rely on the same underlying sub-systems. In this case, when there is a problem felt by one, it is felt by all three, and alarms are generated from all. Event Analytics will detect this relationship based on the event patterns and hence create one group instead of three, each time there is a problem at these sites.
Ease of management
Netcool Operations Insights Event Analytics typically finds a great number of event groupings in the event history, as well as a large number of Seasonal events. Recently a customer discovered over 18,000 Seasonal events within 6 weeks worth of data. Even though only around a third of them were “highly seasonal”, the comment was understandably, “so now I have to go through all of these?” Applying scope-based event grouping first, and then applying Event Analytics based event grouping second, means that you are now only having to work with groups of events, rather than individuals. The same analysis on the groups instead yielded around 200
ScopeID groups, and a number of those were also related. Reviewing and creating handling rules for that number of results is a lot more manageable from a user point-of-view, by leveraging the work that scope-based event grouping has already done.
At a customer site, this super grouping technique was applied and exposed a relationship between 12
ScopeID groups. The Event Analytics found that these 12 groupings always and only ever occurred together in the event history. The Event Analytics were run over 6 weeks worth of data and it was found that this particular scenario occurred 66 times – which equates to more than once per day. In each occurrence, there were around 72 raw events, and around 36 tickets opened as a result. By applying scope-based event grouping first, this was reduced to 12 groupings of events, and hence 12 tickets for each occurrence. By deploying the discovered Event Analytics grouping with the click of a mouse, it was possible to reduce these 12 groups to just a single group, and hence just a single ticket. That’s essentially a ticket reduction from around 36 to 1, for this example.
This technique of leveraging both event grouping capabilities makes for a very elegant and compelling story. Further it appears that the process of defining
ScopeID for an event estate normalises the event data. Raw event data is notoriously “dirty”, and the process of setting
ScopeID for each event seems to cleans it; making it well suited for consumption by Event Analytics. Using the two event grouping capabilities together in this way is highly complementary and yields great results. If the work of setting
ScopeID up has already been done, then applying Netcool Operations Insights Event Analytics to the results takes minutes. The results are highly insightful, exposing previously unknown relationships. The results are also highly valuable, by further reducing the event rows presented to operators, and further reducing the number of tickets opened. This means direct savings to operations; both financially through reduced tickets, and in terms of reduced Mean Time To Repair (MTTR).