In this article, I will describe how Netcool Operation Insight allows customers to reduce Events and Tickets to improve operation. I will also describe the algorithms available (Seasonality and Related Events) and how each algorithm works. I will then explain how to compute, using a NOI Excel spreadsheet, the precise gain possible in term of Ticket and Event reduction that can be used by an Event and Incident manager. I’d also like to thank Ian Manning from Netcool lab to have provided comment to that document.
Netcool Operations Insight (NOI) uses Analytics to help reduce the number of Events and Tickets.
To achieve this goal, NOI uses two algorithms based on Analytics. There is a third mechanism called Scope Based Event grouping not based on Analytics algorithm but rather based on known correlation information and naming convention. It is always good to have a close look at that mechanism as in some cased we can achieve very good event performance reduction. See more information here (https://www.ibm.com/support/knowledgecenter/en/SSSHTQ_8.1.0/com.ibm.netcool_OMNIbus.doc_8.1.0/omnibus/wip/install/concept/omn_con_ext_applyingscopebasedegrp.html).
Event Analytics performs Analytics analyses of NOI historical event data (ie: reporter_status database). Two algorithms are available:
• It can identify seasonal patterns, such as when, and how frequently events occur.
• Events that are seasonal, are either occurring being fixed regularly, so therefore it would be better to fix the underlying cause – or else they are not interesting and should be filtered out altogether.
• Seasonality analyses are output in reports and graphs so that you can find seasonal patterns. For example, an event that periodically occurs at a scheduled specific time is highlighted. If that recurring event generates systematically a ticket that is probably something that you would like to avoid. You can use the information from the seasonality reports to define suppression rules to reduce the number of events.
Related Event (Event Correlation/Grouping based on Analytics)
• It can determine which events have a statistical tendency to occur together and output the results on a scheduled basis as event groups. To determine Event grouping, the recurrence of events occurring together must be observed three times in the same time interval. Time between two events in the group should not be more than 20 minutes. That means you can have 10 minutes between Event A on resource X and Event B on resources Y and then 20 minutes between Event B on resource Y and Event C on resource Z leading to a group that span for 30 minutes (10 minutes (A=>B) + 20 minutes (B=>C)). Of course, you can have more than three events in a group, it is not uncommon to see group with 10 or more events.
• As a simplistic example, think of a database used by a J2EE server, if the database goes down, your J2EE server might generate a lot of events and you might also get some events from your response time tools.
I often get the question: how large should my observation time be to find related events? It all depends on the frequency of instance problem. As a good start, I usually suggest a few months (3 to 6 months) to be sure to find interesting data. But again, if the frequency of your events/problems is high (remember, to create a group, they must occur three times in the observed period), you can probably go with less analysis period (few months or even weeks or days depending of your event flow).
After the analysis, NOI proposes to analyse in the UI the groups found. In the picture below, the problem has occurred 6 times between 8 Feb 2015 and 5 Jan 2016. The group is composed of six events (Machine has gone online, ….. => Diskspace alert).
Since NOI 1.4.01, NOI has introduced the notion of “Generalized Patterns”. Basically, if NOI found Event A on resource X, Event B on resource Y, Event C on resource Z three time in the observation time frame and Event A on resource M, Event B on resource N, Event C on resource O, then it will propose to create a pattern to generalize the group definition. The pattern will contain:
• Event A + Event B + Event C with some rules for resource like regex to group event meaningfully.
Figure 3: Generalization of Event Grouping using Pattern
Event Analysis in Excel
Analyzing of correlations found in NOI through the UI when you have a lot of groups can take some time. That’s why in the analytics UI, you have way to export the result in an Excel spreadsheet. It also allows NOI Analytics results to be shared with Event or Incident manager to review the data and to make a decision on whether or not to deploy the discovered grouping rules.
In the Excel spreadsheet, you have several tab, let’s review the most important one.
Group information presents a global view of all discovered groups with two important fields: EventsCount and Instances.
• EventCount: is the number of events in a group instance. For example, for group « test:15 », we have generally 37 unique events for each found group.
• Instances: represent the number of groups found which have a sequence of events. In the example, it was found 18 times the instance group of group « test:15 » between June 2013 and Sept 2013.
• Group Time To Live: represent the time in millisecond between the first event and the last event in discovered group. For example, here with a value of 1604000 it represents roughly 1604/60 = 26,73 minutes.
• In group « test:15 », we got 18 time the same group between June 2013 and Sept 2013 and each group instance contained 37 unique events. That means that we have potentially presented to operators 18*37 = 666 events when with NOI Related Group, we could have presented them only 18 events (with all other event presented as child events)
Figure 4: example of event breakdown time frame
This view show for a given group (test:15), a synthetic view for each instance of that group.
• Instance ID: a reference number that allow to find in Tab « Instance Events » the details for all event for a given group. For example, we take Instance Id : 1371112200000 for following figure (Instances Events)
• Instance Count: show for group « test:15:x » the name of all occurrences for that particular group (here we have 18 groups instances (x = 0=>17) )
• Unique Event : show for one group instance how many unique events are in that group.
Instance Event in Excel
For a given group and an instance of that group, that tab shows instances of event details
Figure 5: Instance Event in Excel
• Instance ID: allows to find in Tab « Instance Events » all details for a particular group. Here Instance Id : 1371112200000 from previous tab.
• Event Identity: unique event identifier in Netcool (usually Identifier field)
• Instances: 18/18. First number: on the observation period, we have seen 18 time that group. Second number: on the 18 time the group has been seen, that event was present or not in the instance group. For example, we could have seen the group 18 time, but a particular event could have been seen only 15 time (ie not present in 3 group instance).
Figure 6: Instance Event in NOI console
Calculating Event and Ticket reduction
In the following step, we will use the exported data from the Excel spreadsheet to compute potential gain in term of Ticket reduction and Event reduction.
Create a new tab: Event Ticket Reduction for example
Select insert pivot table, and select all column / line in Instance Event Tab
Be sure to select: Add this data to the Data Model
Drag and drop “Group Name “and “Instance Id” in the Rows areas as shown in picture below:
If Distinct Count does not appear, it means you have not selected during Pivot Table creation: Add this data to the Data Model.
Do the same for Server Serial:
It is now very easy to compute for each group the gain in terms of Ticket reduction and Event reduction:
We add one column to compute Ticket reduction and apply to if the following formula:
=IF(B3<>“”,B3-1,””). (Be sure to adapt Bx number to your Excel numbering).
Basically, it means that for each Distinct Count of Ticket Number, we count the number of tickets minus one (ie if the grouping had occurred for the group, we should have generated only one Ticket for Parent Event and not ticket for child event for that group instance).
Then copy the formula to all row.
You should have something like that now
We can then compute how much gain we could have by computing the sum of all Ticket Reduction column and Event Reduction Column.
In the following picture (hiding most column for clarity), we have been able to compute that for Ticket, from 3438 Ticket Opened, if NOI Analytics would have been in place, they could have been reduced to 1654 Tickets.
For Event, we could have reduced them from 13863 to 9399 events using NOI analytics.
Of course, the results will vary in your environment but at least you have now a good method to compute the Ticket and Event reduction.
The Excel spreadsheet can now be easily extended to compute what kind of group allow a maximum Event reduction or Ticket reduction.