General

Q: What is Operations Analytics – Predictive Insights?

A: Operations Analytics – Predictive Insights provides early warning of abnormal behavior which might be indicative of potential outage, service degradation or unexpected change. It dynamically builds thresholds and baselines without need for configuration.
YouTube – SmartCloud Analytics Predictive Insights.

Q: How does Operations Analytics Predictive Insights generate an alarm?

A: Operations Analytic Predictive Insights evaluates based on normal behavior of the data and it can detect anomalous behavior on KPIs. Normal behavior is learned through an initial training period to build an analytics model and constant model retraining as new data loaded thereafter. If a KPI is deemed anomalous an alarm is raised (in the OMNIbus Active Event List).

System and Support

Q: What is the platform that this runs on. Is it stand alone or integrated to Jazz for SM?

A: It’s stand alone and runs on linux. There’s a link coming up to the free trial download here. The platform support is listed there.

Q: Can Predictive Insights gather metrics from solarwinds? Do we have a customer that has done this yet?

A: Yes. We have taken data from many non-IBM sources. Solarwinds was one of these.

Q: Does Predictive Insights use an in-resident memory database, how long does it keep the date for?

A: Predictive Insights itself does not use an in-resident memory database.

Data Format

Q: If I were to use CSV as a datasource what would the ideal format be?

A: The rules for using CSV as a datasource are described in the documentation. Search for Rules for using CSV data sources in the Predictive Insights documentation. The tutorial data also forms the basis of a good example. The Operations Analytics – Predictive Insights tutorial and tutorial data can be found here.

Q: How should we incorporate data with different sampling periods? I am referring to configuration attributes such as the aggregation interval

A: All the data, from multiple different sources, with different sampling intervals, need to be brought together on a common time interval ( e.g. 5min…15min ). Just feed the data into SCAPI, with the intervals you find it at, and it will automatically align and aggregate ( using the system.aggregation.interval specified ). The general approach is one of LeastCommonMultiple: 5min and 15min data should go to 15min, 4min and 5min will go to 20min.

Q: How should we incorporate data from different sources? For example, if the customer has monitoring data from both Cloudwatch (say we dump into csv files) and TDW (DB2 database).

A: You simply configure the two sources appropriately, and Mediation will bring data together from both sources automatically .. CSV+DB …. two DB instances …. nothing special to do here.

Mediation and Training

Q: If we have a set of historical data from the customer, say 6 weeks – what is the best approach to getting Predictive Insights to generate anomaly events? Set the training period to 4 weeks and feed all 6 weeks of data in? If we have a longer period of data, will Predictive Insights generate anomaly events (and keep them in the system)? Say I have 3 months of data from Sep – current, and some events are generated in October and November. Will these events persist as the data feed continues to the current point in time?

A: 4 weeks training followed by 2 weeks scoring is fine. In the case of the 6 week situation, unless you had a priori insight that something very interesting existed in week 3 or 4, then you might turn training down to 2 weeks and score the subsequent weeks. Current best practice is that 4 weeks is a good starting point.

The model is automatically updated over time; so in the Sep-Current situation, if the anomalous situation continues, there’s a good chance that it will be incorporated into the model as it becomes ‘the new normal’. On the other hand, if the anomaly occurs for a short time, and then the situation returns to normal, Predictive Insights will reset the anomalies.

Once it has a model Predictive Insights will assess incoming data to decide if it is anomalous or not. If identified as anomalous, events are generated; if not, no events (and clearing will occur as necessary).

Q: What is training?

A: Training is the number of weeks (default 2) by which Operations Analytics Predictive Insights builds its analytics model and learns the behaviour for the metric groups, metrics and resources defined in the mediation model.

Q: What is the aggregation interval?

A: The aggregation interval is the time period by which metric data is grouped to be aggregated. Data is normalized to the same interval so it can be processed by the algorithms. Usually, the aggregation interval needs to be set to the data collection interval, or to the smallest common multiple of data collection intervals if several data sources are fed to a single algorithm. Typical values are 5 minutes, 15 minutes, or 1 hour.

Q: I have x resources and x metrics.. how many KPIs do i have?

A: EG 1. 1 datasource with 2 metric groups
DS1metricGroup1 with 2 resources and 5 metrics (2 x 5 = 10 KPIs max)
DS1metricGroup2 with 3 resources and 2 metrics (3 x 2 = 6 KPIs max)
1 datasource with 1 metric group
DS2metricGroup1 with 1 resources and 10 metrics (1 x 10 = 10 KPIs max)
The above model created from two datasources will have a maximum of 26 KPIs.

Note: If later during extraction, DS1metricGroup1 sees another resource added then the number of KPIs for that, and subsequent intervals, will have 15 KPI max(3 x 5 = 15 KPIs max). Also, resources may be missing for some intervals.

Q: Mediation model vs analytics model

A: The mediation model is the model created using the mediation tool. This is an XML file with a .pamodel extension that is deployed to the Operations Analytics Predictive Insights database. It describes to the analytics component the metric groups, metrics and resources to be extracted and from what datasource. These models must be deployed to a topic.
YouTube – Mediation Tool Model Design
VS
The analytics model learns behavior and automatically detects trends and behavior of metrics and resources using custom algorithms developed in conjunction with research from the Watson project.

Q: What is a topic?

A: A topic is a method by which Operations Analytics Predictive Insights can separate data logically by geograpic location, application or any other grouping which makes sense to the user. The max number of KPIs per topic is 100k.

Limitations

Q: Is there a limit as to the different number of metrics (or the number of the same metric) that can be measured for the a specific target? Are limitations, if any, documented? If the number of metrics are unlimited, except by the hardware, how is any slowdown managed?

A: As with all applications there is a limit. Trials have been run on over 100K metrics without huge hardware.

Q: How fast can I detect any anomaly, for your WRT/user count example, do we check the current metrics in real-time?

A: Predictive Insights is a near real time application. The user defines an interval rate at which to check the data, usually the same as the rate at which you monitor the data. This means that you are checking in near real time.

Anomalies

Q: Anomaly vs Alarm?

A: An anomaly is when a KPI deviates from its normal behavior. Predictive Insights learns, defines and refines normal behavior during training. An anomaly may be temporary
An alarm is when Predictive Insights sees a KPI (or multiple KPIs) have deviated to a level where it is a problem and must be investigated.

Q: What is a baseline?

A: A baseline is a guide to display the upper and lower values within which a KPI can appear without it being anomalous. A baseline is learned during training. It is shown in the Operations Analytics Predictive Insights UI with a green shaded area.

Q: Some KPI’s baseline change with the KPI value, for others, the baseline remains a flatline even though the KPI values fluctuate. Why?

A: We have 2 different types of baseline. One that changes with the KPI value which is seasonal. The other that remains a flatline even though the KPI values fluctuate which is non-seasonal.

Q: Baseline looks different for the same KPI after opening a second time?

A: This happens because a new model has been re-trained between the first opening of the alarm and second opening of the alarm. In the intervening time Predictive Insights has calculated an updated baseline for this KPI.

The baseline is calculated based on the current model. When opening the Predictive Insights UI we display the baseline reflecting the current model and not previously trained models. This means, for historical alarms, the baseline may not reflect the baseline exactly as it was at the time of the alarm.

Q: How is the baseline calculated? How often does Operations Analytics Predictive Insights recalculate the baseline?

A: The baseline is constantly being recalculated. The baseline is defined using the analytics model. The analytics model, once trained, begins retraining immediately. A baseline is created each time we create an analytics model. The first analytics model is, by default, based off 4 weeks of data. After the first model is created retraining of the Analytics model occurs by default with each new day of data.

Q: Why are there gaps in my KPI plotline in the UI?

A: The coloured lines in the UI are the values for the KPI. If these plot line(s), which is the data for your KPI, displays gaps this can be for a number of reasons. Most likely this is because there are gaps in the data for the KPI you are inputting. If, after checking your input data, there are no gaps there are other questions to ask eg. are you loading data in backlog? what is the latency? is the granularity of the data in the UI set to same value as your aggregation interval?

Q: What is steady state?

A: Steady state is when you are loading for the latest interval which is in near real time. You are in steady state if you are loading your latest data close to now time through mediation and are scoring. Training may happen in steady state but it will take until the end of training period before alarms will appear.

Q: Predictive Insights displays, what looks like, more than one unrelated KPIs together. How does Predictive Insights see this relationship?

Q: Predictive Insights sees that there is a mathematical relationship between these KPIs.
Here is a good time to give an example. In a customer POC Predictive Insights picked up a mathematical relationship between certain metrics. The customers stated that there could not possibly be a relationship between these metrics. This lead them to do investigation. It turned out that these metrics were spiking at the same time on certain nights. They were hosted on separate VMs on the same hardware. On further investigation it turned out that the cleaner would unplug the server to plug in a vacuum cleaner, thus causing anomalies.
Q: What is a causal group?

A: The causal groups is the group of related metrics (max of 6) that are deemed to be related to an anomalous KPI. When you launch an alarm the causal group is displayed in the related metrics tab.

Q: What is a consolidated alarm?

A: Multiple alarms can be grouped together into one consolidated alarm in the AEL. Each can have a causal group of 6 max which may lead to many related metrics.

Q: In the Modify Selected Metrics screen. It shows all the mathematically related KPIs. How are these ordered?

A: These KPIs are in the same causal group. They were ordered based on value (in 1201) that is how anomalous Predictive Insights sees them to be. In 1202 the ordering is random.

Q: What is the value (as displayed in related metrics)?

A: The value, applied to a KPI, is a score calculated by Predictive Insights’s algorithms on how anomalous the KPI is deemed to be.

Dashboard Application Services Hub

For frequently asked questions about Dashboard Application Services Hub, see Dashboard Application Services Hub FAQ

Join The Discussion

Your email address will not be published. Required fields are marked *