Netcool Operations Insight has for years led the market when it comes to event reduction. A new analytics experience in NOI 1.6 further reduces events and mean-time-to-know. Here are the capabilities we’ve added in this release. Today I’m going to focus on Reduce Noise and Incidents.
NOI tools that today help your operators focus on the key actionable events include:
- Impact policies
- Scope-based grouping
The Netcool Development team wanted to advance the tools available at your disposal. The result is a brand new analytics experience and NOI 1.6 is the first release which brings this work to market.
Analytics are now out-of-the box without configuration and occur in real time. What’s more, these analytics work with your existing Netcool environment. The result? An even-further reduced event set and improved mean-time-to-know.
How analytics looks
Our analytics engine decorates events with icons to aid your decision making.
For example, one visual indicator informs you that an event occurs regularly. Hover over the icon to get a short descriptive message that explains the seasonality. “This event tends to occur on a Monday between 3pm and 4pm”. That insight affords you some choices:
- Fix the underlying cause so that your team do not fix the same issue over and over again
- Filter out the event if it’s not very interesting
A privileged user could then click on the icon to discover more information about how the analytics arrived at that decision. For Seassonal Events, the Incident Viewer will give a visualisation of when the event tends to occur, as well as times when it has also occurred.
Another visual indicator helps you understand probable root cause. Hover over the icon to reveal the group’s composition. It may comprise:
- Temporal Correlation
- Scope Based Grouping
- or both!
A privileged user could then click on the icon to see when the events occurred together previously to help build trust in the analytics.
Let’s look at a simple example. A fan failure leads to the overheating of a power supply which leads to a server malfunction. These events would be temporally grouped, as they occurred in quick succession. Sorting the group by first-occurrence confirms the fan failure as the probable root cause.
You can see that grouping events together reduces the number of events that need action.
Either way, you have less events! And our visual indicators helps you understand why.
How analytics works
We built brand new cloud-scalable algorithms that work across your entire event data. That is a bold statement to make when you have some of the largest customers in the world! We combined a distributed architecture with these advanced algorithms to make this possible.
It works as follows:
- Our software takes large tasks and breaks them up into many smaller tasks.
- The tasks run distributed across a cluster of servers.
- Once complete, the software recomposes the smaller tasks into a final result.
- Discover how to modernize IT operations management (ITOM) with artificial intelligence operations (AIOps) and hybrid deployment options at the IBM ITOM site.
- For details and pricing information about IBM Netcool Operations Insight (NOI), visit the IBM NOI site
- NOI 1.6 released on June 31 2019. Read more about the new features of NOI 1.6
- Explore NOI 1.6 further in this series of blogs from the technical team who created this release.
Reducing the number of events
It is vital to maximise your team’s efficiency. You try to avoid having many operators working on the same problem at the same time.
We define an “Incident” as a set of events grouped such that a single person can take action. An Incident can comprise many groups, derived from AI or scope-based learning – or both. Take for example one of our multivariate algorithms. It discovers temporal correlations and automatically groups them together.
Improving the Mean Time To Know
Incidents not only help reduce the number of events, they can also help reduce the Mean Time To Know. Presenting events together gives operators context to understand the extent of the problem.
Imagine the following scenario. You have an online business called “Sock Shop”. Ensuring your business is always available is your number one priority. Different types of monitoring tell you when problems arise. For example, you have a synthetic test to measure how fast pages are being served to your users. This will send you an event when the Response time is “high”. That is good information – now you can investigate why the issue is happening. Is it because ads have driven more traffic to your site? Is your database under pressure? Is it your application server? Something else?
Unbeknownst to you, someone else has already started to investigate high memory usage on a container. They are wondering what that means. Is one of your services impacted? Are customers affected?
This is where analytics can help! Consolidating the information into one spot makes it clear what the impact is. And how to begin resolving it. A single person can then take the necessary action. In the example above, the memory issue on the container was impacting the Response Time to increase.
Find out more
In this blog post I discussed how NOI reduces the number of your events. I showed you how our brand-new, out-of-the-box Analytics can reduce events even further. I also highlighted how “Incidents” can reduce Mean Time To Know and maximise your team’s efficiency.
If you would like to know more about how NOI can help you reduce noise and incidents, and help you work more efficiently, please feel free to contact us!