## Introduction

Time series forecasting is a very broad subject. The ability to forecast future values is applicable in areasÂ such as sales forecasting, stock market analysis and utilities forecasting (i.e. energy consumption). Forecasting can be a complicated subject as there many different forecasting algorithms, with each algorithm having certain properties that only makes it useful in specific circumstances. Furthermore, tuning a specific algorithm in order to provide accurate forecasting results requires a deep understanding of that specific algorithm. These challenges can sometimes be off-putting and prevent forecasting analysis from being added to applications. In order to help developers introduce forecasting into their applications, the Streams’ time series toolkit provides the AutoForecaster operator.

The AutoForecaster operator has been designed to allow developers to easily add forecasting analysis to their applications. The most useful feature of the AutoForecaster operator is that it does not require the user to select or even understand the forecasting algorithms being used. Instead, the AutoForecaster will automatically select the best algorithm (from a pre-defined set of algorithms) based on the incoming data. In addition to selecting the best algorithm, the operator also has the ability to switch forecasting algorithms during run-time without interrupting the analysis. This is useful if the incoming data changes and the previously selected algorithm is no longer suited for this new data pattern. This feature will be discussed in more detail later on.

The following graph demonstrates the forecasting capabilities of the AutoForecaster. The blue line represents the actual network load of a system. The orange line represents the forecasted values produced by the AutoForecaster. Note that the initial network load was relatively flat, however after some time it spiked and began to oscillate. This demonstrates the AutoForecasters ability to adapt to changing data patterns.

## Analyzing Time Series

### Â Ingesting Data

The AutoForecaster operator is capable of analyzing either a single time series or multiple, independent time series. Both of these cases will be examined below.

To analyze a single time series, each tuple consumed by the operator must contain an attribute of typeÂ * float64*, which represents a single point in the time series. For example, the following diagram shows a single sensor that periodically records a valueÂ and sends it to the AutoForecaster. Each time the sensor recordsÂ a value, a single data point is sent to the AutoForecaster.

Here is an SPL snippet that demonstrates the operator analyzing a single time series:

(stream<float64 inputData> NetloadData) as NetloadSource = FileSource() { Â Â Â param Â Â Â Â Â Â file : "netload.out" ; } (stream<float64 inputData, uint64 forecastedTimestamp,float64 forecastedResult> Â Â Â ForecastedResults) as ForecastingOperator = AutoForecaster2(NetloadTimeData) { Â Â Â param Â Â Â Â Â ÂinputTimeSeries : inputData; Â Â Â Â Â Â initSamples : 100u ; Â Â Â Â Â Â stepAhead : 20u ; Â Â Â Â Â Â algorithm : Dynamic ; Â Â Â output Â Â Â Â Â Â ForecastedResults :forecastedResult = forecastedTimeSeriesStep(), Â Â Â Â Â Â Â Â Â forecastedTimestamp = forecastedTimestamp() ; }

To analyze multiple, independent time series, each tuple consumed by the operator must contain an attribute of typeÂ * list<float64>.Â *Each element in the list contains a data point from each of the sensors at a single point in time. For example, the following diagram shows 3 independent sensors that periodically record values. At a given point in time, each sensor will record a value. Each of those recorded values, at the given point in time, will be added to a list. The list is then sent to the AutoForecaster operator, which will forecast a future valueÂ for each of the 3 sensors.

Here is an SPL snippet showing the operator analyzing multiple time series:

(stream<list<float64> listInputData> NetloadData) as NetloadSource = FileSource() { Â Â Â param Â Â Â Â Â Â file : "multi_netload.out" ; } (stream<list<float64> listInputData, uint64 forecastedTimestamp,list<float64> forecastedResult> Â Â Â ForecastedResults) as ForecastingOperator = AutoForecaster2(NetloadTimeData) { Â Â Â param Â Â Â Â Â ÂinputTimeSeries : listInputData; Â Â Â Â Â Â initSamples : 100u ; Â Â Â Â Â Â stepAhead : 20u ; Â Â Â Â Â Â algorithm : Dynamic ; Â Â Â output Â Â Â Â Â Â ForecastedResults :forecastedResult = forecastedTimeSeriesStep(), Â Â Â Â Â Â Â Â Â forecastedTimestamp = forecastedTimestamp() ; }

### Results

The AutoForecaster operator is not only capable of forecasting the next value in the time series, but can actually forecast several steps into the future. The number of future values (steps)Â that the AutoForecaster forecasts is controlled by theÂ **stepAhead** parameter. When submitting results, the operator can output all of the values up to the specified number of steps.

There are two output functions used to return forecasted values:

**forecastedTimeSeriesStep()** â€“ returns the forecasted time series value at step n, where n is the same as the **stepAhead** parameter value.

**forecastedAllTimeSeriesSteps()** â€“ returns a list of forecasted time series values from step 1 to step n, where n is the same as the **stepAhead** parameter value.

**Note:** the return type of these output functions depends on whether the operator is analyzing a single time series or multiple time series. The following table shows the return type of these output functions based on the type of input data.

### Timestamps

The operator is also capable of calculating the future timestamp values along with the forecasted values. In order to calculate the future timstamp values, the incoming data must contain an attribute of eitherÂ * timestamp* orÂ

*. ThisÂ*

**uint64****inputTimestamp**parameter value must refer to this attribute.

The operator contains two output functions used to return the calculated future timestamp values:

**forecastedTimestampStep()** â€“ returns the calculated futureÂ timestamp value at step n, where n is the same as the **stepAhead** parameter value.

**forecastedAllTimestampSteps()** – returns a list of calculated futureÂ timestamp values from step 1 to step n, where n is the same as the **stepAhead** parameter value.

**Note:** the return type of these output functions depends on the type of the incoming timestamp attribute. The following table shows the return type of these output functions based on the type of the incoming attribute:

### Parameters

The AutoForecaster operator comes with a number of parameters. Information regarding all of the available parameters can be found on the AutoForecaster KnowledgeÂ Center page. However, here are some of the important parameters:

* inputTimeSeries *– This parameter specifies the attribute on the input port that contains the time series data. The specified attribute must have a type of either

**or**

*float64***. This is a**

*list<float64>**required*parameter.

* inputTimestamp *– This parameter specifies the attribute on the input port that contains the timestamp information. While this parameter is optional, it must be specified in order to use the

**forecastedTimestampStep()**Â and

**forecastedAllTimestampSteps()**Â output functions.

* initSamples *– This parameter specifies the number of initial tuples that should be used to initially train the underlying algorithms. The number of samples to select depends on the type of data. For example, if the goal is to forecast results 1 day into the future, then this parameter should be set to a value that represents 1 day worth of data. It important to note that some of the underlying algorithms may useÂ additional input data before initializing.

* stepAhead *– This parameter specifies how far into the future the operator should forecast. By default, this parameter is set to a value of 1.

* algorithm *– This parameter specifies whether the operator should continuously attempt to find a new algorithm (

*dynamic*mode), or whether it should pick the best algorithm based on the initial set of training data and use that algorithm indefinitely (

*static*mode). Since this topic if very important to the output of the operator, an entire section has been dedicated to explain the differences between

*dynamic*and

*static*mode.

## Static vs Dynamic

In the introduction I mentioned that the AutoForecaster operator is capable of dynamically switching algorithms in order to produce the best possible forecasting. To be more specific, the operator comes with two modes: static mode and dynamic mode. Both of these modes will be discussed in detail below.

### Static

When the operatorÂ is set to *static*, it will use the initial set of input data to determine which algorithm provides the best forecasted results. Once the forecasting algorithm has been determined, that algorithm will be used for the remainder of the operatorâ€™s life. The advantage to running the operator in static mode is that itÂ will be able to process tuples faster. This is due to the fact that the operator will only be forecasting future values and will not be continuously analyzing the incoming time series to find a better performing algorithm.

However, the disadvantage to using static over dynamic is that if the time series pattern changes dramatically over time, then the selected algorithm may no longer provide acceptable forecasted values. For example, assume the initial training data was mostly linear. In this case, it is likely that the AutoForecaster will select a linear regression algorithm to do the forecasting. Later on, if the time series data becomes non-linear, the forecasted values returned by the operator may not be accurate since a linear regression algorithm is unsuitable for this type of time series.

The following graph demonstrates this behavior. The blue line represents the actual network load that was captured and the orange line represents the forecasted values produced by the AutoForecaster operator. For the first 800 seconds, the network load is mostly flat (linear). However, after 800 seconds, the load begins to fluctuate dramatically. The operator was able to accurately forecast the load for the first 800 seconds, however once the time series pattern changed, the algorithm selected by the operator was no longer able to accurately forecast the values.

### Dynamic

When the parameter value is set to *dynamic*, the AutoForecaster operator will continuously analyze the incoming time series values to determine the best forecasting algorithm. Each time a new tuple arrives, the operator will do the following:

- Forecast the future time series values and submit the results to the operatorâ€™s output port (similar toÂ
*static*mode) - Analyze the current time series to determine if the current forecasting algorithm is still the best algorithm or if there is another algorithm that should be used (only done inÂ
*dynamic*mode)

These steps are repeated for every tuple that is sent to the AutoForecaster. This allows the operator to adapt to changing time series patterns, thus enabling it to provide more accurate forecasted results. On the flip side, the consequence of setting the parameter value to *dynamic* is that the operator will spend more time processing a tuple (as compared to setting the value to *static*), ultimately decreasing the overall flow rate.

Revisiting the network load example above, we can clearly see the results of changing the algorithm parameter value to â€˜dynamicâ€™. Once again, the first 800 seconds are accurately predicted as being mostly linear in nature. However, when the input network load begins to fluctuate, the AutoForecaster is capable of automatically detecting this change and switching to a more accurate forecasting algorithm.

## Conclusion

The AutoForecaster operator can be a powerful tool when building a Streaming application. It abstracts many of the complicated details of forecasting algorithms while still providing accurate and relevant results. The ability to dynamically switch algorithms at run-time is a key component of this operator and can be invaluable when analyzing unpredictable real-time data.

## Samples

The network load example discussed early can be downloaded from GitHub here: https://github.com/IBMStreams/samples/tree/master/timeseries/AutoForecasterSamples.

## Additional Links

AutoForecaster2 Operator – Knowledge Center