Netcool event grouping is the zenith of correlation. Many Netcool users around the world leverage the event grouping capabilities of Netcool to dramatically reduce ticket numbers, thereby achieving significant operational cost reduction, and reduce Mean Time To Repair (MTTR).

The clearest application of event grouping to cost-reduction is in the area of ticketing; rather than raise multiple tickets on a per-event basis, instead raise a single ticket for a group of events. This means all the information relating to a single incident is kept together, and consolidated into a single ticket. This makes triage and resolution easier. The goal is: one incident equals one ticket.

Two main types of event grouping

There are two main types of event grouping offered in Netcool: scope-based and analytics-based grouping (also known as” Related Events”). Scope-based grouping leverages tribal and domain knowledge and centres around defining what scope means in your environment, and then grouping events based on the scope. Analytics-based grouping looks at the event history to try to determine which events historically occur together, and then using these insights to group the events together, if they occur in future. This blog focuses on scope-based event grouping, and how to define scope.

Scope-based event grouping

Scope-based event grouping works based on the principle that if I get a collection of events from the same place at the same time, then I should group those events together. In this context, “same place” would be my scope, and “same time” would be a time window that defines how long I receive events for any given incident. These two elements combine together in the NOI scope-based grouping mechanism as a basis for doing the grouping.

What does scope mean?

The goal of event grouping is to correlate and group events together that relate to the same incident. Hence the scope of any incident is like a boundary that encircles all the events that relate to the incident. This might be a geographic boundary, or it might be a logical one, or it might be based on a combination of elements, such as Node and Service. Once there is an understanding of what is meant by scope, the definition of what scope should be in a given organisation quickly becomes clear.

Geographic boundary example

Widgetcom is a wireless provider and has cell sites dotted all over the country. If there is an issue at a cell site, for example, a power failure, they might see alerts from different sources streaming into Netcool. For example, some may relate to building management systems like generators and air conditioning systems, and some might be from equipment housed in the building, such as telco switches. If there is such an issue, they will typically see a burst of events from the various systems over a 10 minute period. In this scenario, the “scope” of the incident is the physical cell site location, hence using the location as the scope for grouping makes sense, along with a time window setting of 10 minutes.

Netcool Event Viewer with event groups

Logical grouping example

Businesscorp is a large-scale enterprise and their Netcool solution supports many lines of business within the organisation. Each business unit owns a set of applications that run on a number of dedicated physical and virtual machines. Both the applications and the servers are heavily instrumented in terms of monitoring and tend to generate a lot of events. If there is any sort of issue on one of the machines, this usually manifests itself as alerts coming from either the applications, or the servers, or both, and these alerts tend to flow into Netcool in bursts. Typically all events for a given issue will come in within a five-minute window. In this scenario, the “scope” of the problem would be the line of business, hence using the line-of-business ID as the scope for grouping makes sense, along with a time window setting of 5 minutes.

How does the time window work?

Scope-based grouping offers two time window options: a fixed time window, or a dynamic one. When the first event in a group occurs, the synthetic parent event is created, and the clock starts ticking. If the time window is fixed, then the group will be closed after a defined number of seconds after the group is created, regardless of whether more events are continuously streaming in for that group. Once a group is deemed closed, no further events will be added to it. If more events are received by Netcool after the group has closed, a new group will be created, with a new synthetic parent event. If, however, the time window is dynamic, the expiry time for the group is extended each time a new event is added, effectively keeping the group open. In most cases, a dynamic time window is preferred over a fixed one, as it makes the system more flexible in its receipt of events.

When defining a scope in any scenario, it is important to keep in-mind the significance of the time window. In the geographical boundary example above, not all alarms that ever come from the same cell site all relate to the same problem, of course, but all the alarms from the same cell site in the same time window probably are. Hence the scope does not necessarily have to be as granular as one might think. Sometimes a bigger “net” with a smaller time window can work best. Depending on the scenario, a little bit of testing will quickly help you arrive at a suitable value for both the right scope and time window to use.

Which fields do I set?

The field where the scope is defined is called: ScopeID VARCHAR(255). Event grouping will occur automatically if the field ScopeID is set to a non-null value. If ScopeID is not set for an event, then the grouping automation will not act on that event, and that event will remain un-grouped.

The field where the time window is defined is called: QuietPeriod INTEGER. The time window is called QuietPeriod because it primarily defines the time window in terms of the amount of seconds that pass, after which no further events are received; that is, the period after which “it all goes quiet”. This is the dynamic time window scenario. Note that when the fixed time window is used, the same field is used, and the expiry time for the group is also defined as: group creation time + QuietPeriod. Unlike the dynamic time window, the expiry time for the group is never extended, and represents a hard cut-off.

How does it work?

When an event is received into Netcool/OMNIbus, if the ScopeID field is set, the grouping automation will create a synthetic parent event, and make the incoming event a child of the parent. The expiry time for the group will be set as: group creation time + the QuietPeriod of the incoming event. If the QuietPeriod is not set in the incoming event (ie. is zero), then the grouping automation will use the global default property instead: SEGQuietPeriod from the master.properties table. If the ScopeID value starts with the string: “FX:”, then a fixed time window is used. In any other case, a dynamic time window is assumed.

How can I try it out?

You can activate scope-based grouping by simply setting the ScopeID field in Netcool to a non-null value. If you have some thoughts about what ScopeID might mean in your environment, and want to try it out, you can follow these steps:

  1. Install scope-based event grouping from the OMNIbus extensions directory ($OMNIHOME/extensions/eventgrouping)
  2. Open the Netcool Administrator tool and create a new insert database trigger on the alerts.status table
  3. Add an if-elseif construct in the trigger to set ScopeID, depending on the event type

NOTE: Installation instructions on how to install scope-based event grouping can be found in the IBM Knowledge Center here (opens in new window).

The following is an example ObjectServer database trigger that sets up ScopeID for incoming events:

CREATE OR REPLACE TRIGGER widgetcom_set_scopeid
GROUP widgetcom_triggers
PRIORITY 1
COMMENT 'Sets ScopeID on incoming events'
BEFORE INSERT ON alerts.status
FOR EACH ROW
WHEN get_prop_value('ActingPrimary') %= 'TRUE' and new.ScopeID = ''
begin

	-- SET ScopeID BASED ON Location FOR Class 100 EVENTS
	if (new.Class = 100) then

		set new.ScopeID = new.Location;

	-- SET ScopeID BASED ON AlertGroup FOR Class 200 EVENTS
	-- ALSO SHORTEN GLOBAL QuietPeriod TO 5 MINUTES
	elseif (new.Class = 200) then

		set new.ScopeID = new.AlertGroup;
		set new.QuietPeriod = 300;

	-- ELSE SET ScopeID BASED ON Node
	else

		set new.ScopeID = new.Node;
	end if;
end;
go

 

NOTE: The above code can be placed into a file and ingested into your Netcool/OMNIbus ObjectServer via the nco_sql command:

$OMNIHOME/bin/nco_sql -server AGG_P -user root -password netcool < set_scopeid.sql

Where should I set my ScopeID?

The ScopeID field can be set anywhere – but the most common places are:

  • ObjectServer trigger (as per the above example) – if the ScopeID value is contained within the incoming event data
  • Probe rules file – again, if the ScopeID value is contained within the incoming event data
  • Netcool/Impact policy – if the ScopeID needs to be looked up in an external system, like a CMDB

Identifying and leveraging priority child event information

Netcool event grouping includes a capability to identify the priority child event in a grouping, and then propagate elements of that child event up to the parent event. There are four built-in options for selecting the priority child event:

  • Choose the event with the highest CauseWeight
  • Choose the event with the highest ImpactWeight
  • Choose the event with the first FirstOccurrence (ie. the first event entering the group)
  • Choose the event with the last LastOccurrence (ie. the last event entering the group)

NOTES:

  • The priority option is global and only one can be in-use at a time.
  • The details of the highest priority event are remembered, even if that child event clears and is deleted. The stored details are only replaced if a child event enters the group that has a higher priority.
  • If CauseWeight and ImpactWeight are set in any child events, the highest value in each case will automatically propagate to the respective fields in the parent event.
  • Using the highest CauseWeight to identify the priority child event option tends to be the most popular choice, amongst Netcool practitioners worldwide. It requires the additional step of defining cause weights for the incoming events, in order to be of any benefit.
  • More information on event weighting and standard templates can be found in the IBM Knowledge Center here (opens new window).

CUSTOMTEXT FIELD
One of the fields each event in Netcool/OMNIbus has is the CustomText field. For each “real” event, this field should be populated with any data that needs to be propagated up to the parent event, if this event were to be identified as the priority child event. For example, CustomText contents might include the concatenation of certain key fields or custom fields from the child event.

For example, I might augment my insert trigger further to set the CauseWeight and CustomText, based on a couple of custom fields:


	-- SET ScopeID BASED ON Location FOR Class 100 EVENTS
	if (new.Class = 100) then

		set new.ScopeID = new.Location;
		set new.CauseWeight = 1000;
		set new.CustomText = new.WidgetCharField + ':' + to_char(new.WidgetIntField);
	...

 

Once the priority child event has been identified based on the one of the four criteria, the CustomText field from the priority child is automatically copied to the CustomText field of the parent event. The CustomText field in the parent event can then be included in the parent event’s Summary field (enable the property SEGUseScopeIDCustomText – see notes below), or sent over to a ticketing system to provide additional detail around the probable cause of an incident.

Customising scope-based event grouping

Scope-based event grouping is customised by modifying properties in the master.properties table. Each record in the master.properties table has the fields: CharValue (VARCHAR(255) and: IntValue INTEGER. Only one of the two fields will be used in each case, depending on what the property is for. For example, the SEGQuietPeriod property or a boolean type property will make use of the IntValue value, whereas a property specifying a prefix label for the synthetic event’s Summary field will make use of the CharValue value.

The various properties come pre-set with out-of-the-box values and are documented in the IBM Knowledge Center here (opens new window).

PRIORITY CHILD EVENT SELECTION
The following properties relate to what defines the priority group child event:

  • SEGPropagateTextToScopeIDParentCause : set IntValue to 1 (default is 0) to specify that the priority child is based on highest CauseWeight value
  • SEGPropagateTextToScopeIDParentImpact : set IntValue to 1 (default is 0) to specify that the priority child is based on highest ImpactWeight value
  • SEGPropagateTextToScopeIDParentFirst : set IntValue to 1 (default is 0) to specify that the priority child is based on first FirstOccurrence value
  • SEGPropagateTextToScopeIDParentLast : set IntValue to 1 (default is 0) to specify that the priority child is based on last LastOccurrence value

NOTE: If more than one of the above options are selected, the grouping automation will default to the above order of precedence, the CauseWeight being the highest precedence.

If any of these options are selected, the CustomText field of the priority child event will be automatically propagated to the CustomText field of the parent event. The CustomText field value of the parent will not change unless a new child event with a higher priority value enters the group. Additionally, even if the child event with the highest priority subsequently clears and is deleted from Netcool/OMNIbus, the CustomText and priority information about that child event will still be retained in the parent event, and only updated if a new child event with higher priority subsequently enters the group.

ACTIVATE JOURNALLING OF CHILD EVENTS
A very useful capability of scope-based event grouping is the automatic journalling of child event details to the journal of the parent event. This feature provides a mechanism to capture the forensic history of the events that have passed through the group, which is particularly valuable if the underlying events are transient or are flapping. This capability is very useful for viewing a forensic listing of the child events both from the Event Viewer in Web GUI, as well as from the ticket work log. Automatic journal propagation from a ticketed event is an out-of-the-box feature of Netcool Gateways.

NOTE: The automatic journalling of child events is disabled by default since it will create journals in the ObjectServer, and thus induce an element of loading into the system. It is up to the customer therefore to enable this feature and do due diligence load testing on a non-production system, prior to use.

The following property relates to the creation of journals in the parent events that contain details about the child events:

  • SEGJournalToScopeIDParent : set IntValue to 1 (default is 0) to activate journalling of child events to parent event
  • SEGJournalServerNameServerSerial : set IntValue to 1 (default is 1) to include each child event’s ServerName and ServerSerial fields in the journal detail
  • SEGJournalNode : set IntValue to 1 (default is 1) to include each child event’s Node field in the journal detail
  • SEGJournalSummary : set IntValue to 1 (default is 1) to include each child event’s Summary field in the journal detail
  • SEGJournalAlertKey : set IntValue to 1 (default is 1) to include each child event’s AlertKey field in the journal detail
  • SEGJournalCustomText : set IntValue to 1 (default is 1) to include each child event’s CustomText field in the journal detail

Can I do sub-grouping?

Scope-based event grouping occurs for those events where a non-null value is set in the ScopeID field. It is possible to cause child events to be sub-grouped under the ScopeID sub-group by setting a non-null value in the SiteName field.

NOTE: The sub-grouping field is called “SiteName” due to legacy reasons. For practicality, it is better to think of it as “SubGroup” instead.

If sub-grouping is required, the SiteName field must be set either at the same time or before ScopeID is set. Because grouping occurs the moment the ScopeID field is set (as a database trigger), SiteName will only be taken into account at that time the grouping is done. If it is not set at the time the grouping automation fires, it will assume sub-grouping is not required for this child event. Note that a ScopeID group may contain both direct child events as well as sub-groups, hence some child events in a ScopeID grouping might have SiteName set and be under a sub-grouping, and some might not and be direct children events of the ScopeID parent event.

There are a number of properties in the master.properties table that relate to handling sub-groups. They follow along the same lines as the properties described above and are documented along with them in the same place on IBM Knowledge Center here (opens new window).

Delay ticketing for event groups

Customers typically want to leverage the priority child propagation feature of scope-based event grouping, so that the parent event reflects elements of the highest priority child event. Often however, it can take some time for the priority child event to arrive into Netcool. Many customers therefore delay the ticket creation off the parent event to allow time for the priority event to arrive, so that they can set up the parent event suitably, prior to ticketing.

If this approach is taken, the ticketing integration can be configured to only act on parent events that are of at least a certain age. Since the parent event FirstOccurrence and LastOccurrence fields always reflect the first and last occurrences respectively of the underlying children instead of the first and last occurrence of the parent event itself, it is not ideal to use either of these fields in the ticketing filter, in order to delay ticketing of the parent event. A more suitable field to use for this purpose instead is: InternalLast. Since the parent event is only ever inserted once and never deduplicated, the InternalLast field reflects the true first and last occurrence of the parent event.

An example ticketing filter that only tickets parent events that are older than 5 minutes therefore is:

AlertGroup = 'ScopeIDParent' and InternalLast < getdate - 300

NOTE: Scope-based event grouping parent events can be identified by: AlertGroup = 'ScopeIDParent'.

Summary

On the face of it, leveraging scope-based event grouping may seem to some like a lot of work to implement it. In practice, the reality is the opposite, and the potential returns are high. Ultimately scope-based event grouping is enabled simply by setting ScopeID. Many customers around the world who, once they understand how the grouping mechanism works, and what ScopeID means, very quickly have preliminary and meaningful grouping working in under an hour. The rest of the configuration, event weighting, priority child identification, etc. are all tuning and customisation tasks. The many properties and tuning options exist simply to allow customers to tune the resulting grouping and the look and feel of the resulting parent event. Each property and option in fact exists because of specific feature requests from customers.

And the relatively small amount of work to set up scope-based grouping is worth it. Most customers see upwards of a 70% row reduction in their Event Viewers, and enjoy the tremendous benefits of having events organised by incident – bringing order to the chaos. From a financial standpoint, the savings are clear. One large North American communications provider recently reduced their ticket numbers by 75%, by leveraging scope-based event grouping. This was achieved by applying scope-based event grouping, and then only auto-ticketing off groups, rather that off individual events. This allowed them to have one ticket per incident, and have all of the related event detail recorded in that same ticket, courtesy of the journalling feature. Another wireless telco customer cited a saving of just under 1M USD per annum due to savings in ticket creation, and over 3M USD due to reduced MTTR, by leveraging scope-based event grouping. The case for its use and application therefore is clear.

Coming up…

In my next blog, I will describe how to take event grouping to the next level, by leveraging analytics on top of scope-based event grouping groups. This enables you to gain valuable insights, and achieve further event compression and ticket reduction. This new concept in NOI event grouping is called: super groups.

1 comment on"How can I use Netcool scope-based event grouping?"

  1. Mike ODELL May 24, 2018

    Excellent stuff Mr. Bray!

Join The Discussion

Your email address will not be published. Required fields are marked *