A: Operations Analytics – Predictive Insights provides early warning of abnormal behavior which might be indicative of potential outage, service degradation or unexpected change. It dynamically builds thresholds and baselines without need for configuration.
YouTube – SmartCloud Analytics Predictive Insights.
A: Operations Analytic Predictive Insights evaluates based on normal behavior of the data and it can detect anomalous behavior on KPIs. Normal behavior is learned through an initial training period to build an analytics model and constant model retraining as new data loaded thereafter. If a KPI is deemed anomalous an alarm is raised (in the OMNIbus Active Event List).
System and Support
A: It’s stand alone and runs on linux. There’s a link coming up to the free trial download here. The platform support is listed there.
A: Yes. We have taken data from many non-IBM sources. Solarwinds was one of these.
A: Predictive Insights itself does not use an in-resident memory database.
A: The rules for using CSV as a datasource are described in the documentation. Search for Rules for using CSV data sources in the Predictive Insights documentation. The tutorial data also forms the basis of a good example. The Operations Analytics – Predictive Insights tutorial and tutorial data can be found here.
A: All the data, from multiple different sources, with different sampling intervals, need to be brought together on a common time interval ( e.g. 5min…15min ). Just feed the data into SCAPI, with the intervals you find it at, and it will automatically align and aggregate ( using the system.aggregation.interval specified ). The general approach is one of LeastCommonMultiple: 5min and 15min data should go to 15min, 4min and 5min will go to 20min.
A: You simply configure the two sources appropriately, and Mediation will bring data together from both sources automatically .. CSV+DB …. two DB instances …. nothing special to do here.
Mediation and Training
A: 4 weeks training followed by 2 weeks scoring is fine. In the case of the 6 week situation, unless you had a priori insight that something very interesting existed in week 3 or 4, then you might turn training down to 2 weeks and score the subsequent weeks. Current best practice is that 4 weeks is a good starting point.
The model is automatically updated over time; so in the Sep-Current situation, if the anomalous situation continues, there’s a good chance that it will be incorporated into the model as it becomes ‘the new normal’. On the other hand, if the anomaly occurs for a short time, and then the situation returns to normal, Predictive Insights will reset the anomalies.
Once it has a model Predictive Insights will assess incoming data to decide if it is anomalous or not. If identified as anomalous, events are generated; if not, no events (and clearing will occur as necessary).
A: Training is the number of weeks (default 2) by which Operations Analytics Predictive Insights builds its analytics model and learns the behaviour for the metric groups, metrics and resources defined in the mediation model.
A: The aggregation interval is the time period by which metric data is grouped to be aggregated. Data is normalized to the same interval so it can be processed by the algorithms. Usually, the aggregation interval needs to be set to the data collection interval, or to the smallest common multiple of data collection intervals if several data sources are fed to a single algorithm. Typical values are 5 minutes, 15 minutes, or 1 hour.
A: EG 1. 1 datasource with 2 metric groups
DS1metricGroup1 with 2 resources and 5 metrics (2 x 5 = 10 KPIs max)
DS1metricGroup2 with 3 resources and 2 metrics (3 x 2 = 6 KPIs max)
1 datasource with 1 metric group
DS2metricGroup1 with 1 resources and 10 metrics (1 x 10 = 10 KPIs max)
The above model created from two datasources will have a maximum of 26 KPIs.
Note: If later during extraction, DS1metricGroup1 sees another resource added then the number of KPIs for that, and subsequent intervals, will have 15 KPI max(3 x 5 = 15 KPIs max). Also, resources may be missing for some intervals.
A: The mediation model is the model created using the mediation tool. This is an XML file with a .pamodel extension that is deployed to the Operations Analytics Predictive Insights database. It describes to the analytics component the metric groups, metrics and resources to be extracted and from what datasource. These models must be deployed to a topic.
YouTube – Mediation Tool Model Design
The analytics model learns behavior and automatically detects trends and behavior of metrics and resources using custom algorithms developed in conjunction with research from the Watson project.
A: A topic is a method by which Operations Analytics Predictive Insights can separate data logically by geograpic location, application or any other grouping which makes sense to the user. The max number of KPIs per topic is 100k.
A: As with all applications there is a limit. Trials have been run on over 100K metrics without huge hardware.
A: Predictive Insights is a near real time application. The user defines an interval rate at which to check the data, usually the same as the rate at which you monitor the data. This means that you are checking in near real time.
A: An anomaly is when a KPI deviates from its normal behavior. Predictive Insights learns, defines and refines normal behavior during training. An anomaly may be temporary
An alarm is when Predictive Insights sees a KPI (or multiple KPIs) have deviated to a level where it is a problem and must be investigated.
A: A baseline is a guide to display the upper and lower values within which a KPI can appear without it being anomalous. A baseline is learned during training. It is shown in the Operations Analytics Predictive Insights UI with a green shaded area.
A: We have 2 different types of baseline. One that changes with the KPI value which is seasonal. The other that remains a flatline even though the KPI values fluctuate which is non-seasonal.
A: This happens because a new model has been re-trained between the first opening of the alarm and second opening of the alarm. In the intervening time Predictive Insights has calculated an updated baseline for this KPI.
The baseline is calculated based on the current model. When opening the Predictive Insights UI we display the baseline reflecting the current model and not previously trained models. This means, for historical alarms, the baseline may not reflect the baseline exactly as it was at the time of the alarm.
A: The baseline is constantly being recalculated. The baseline is defined using the analytics model. The analytics model, once trained, begins retraining immediately. A baseline is created each time we create an analytics model. The first analytics model is, by default, based off 4 weeks of data. After the first model is created retraining of the Analytics model occurs by default with each new day of data.
A: The coloured lines in the UI are the values for the KPI. If these plot line(s), which is the data for your KPI, displays gaps this can be for a number of reasons. Most likely this is because there are gaps in the data for the KPI you are inputting. If, after checking your input data, there are no gaps there are other questions to ask eg. are you loading data in backlog? what is the latency? is the granularity of the data in the UI set to same value as your aggregation interval?
A: Steady state is when you are loading for the latest interval which is in near real time. You are in steady state if you are loading your latest data close to now time through mediation and are scoring. Training may happen in steady state but it will take until the end of training period before alarms will appear.
Here is a good time to give an example. In a customer POC Predictive Insights picked up a mathematical relationship between certain metrics. The customers stated that there could not possibly be a relationship between these metrics. This lead them to do investigation. It turned out that these metrics were spiking at the same time on certain nights. They were hosted on separate VMs on the same hardware. On further investigation it turned out that the cleaner would unplug the server to plug in a vacuum cleaner, thus causing anomalies.
A: The causal groups is the group of related metrics (max of 6) that are deemed to be related to an anomalous KPI. When you launch an alarm the causal group is displayed in the related metrics tab.
A: Multiple alarms can be grouped together into one consolidated alarm in the AEL. Each can have a causal group of 6 max which may lead to many related metrics.
A: These KPIs are in the same causal group. They were ordered based on value (in 1201) that is how anomalous Predictive Insights sees them to be. In 1202 the ordering is random.
A: The value, applied to a KPI, is a score calculated by Predictive Insights’s algorithms on how anomalous the KPI is deemed to be.
Dashboard Application Services Hub
For frequently asked questions about Dashboard Application Services Hub, see Dashboard Application Services Hub FAQ