InfoSphere Streams V3.2 includes a number of new features which make writing Streaming applications easier and provide more integration and analytic capabilities out of the box.  Streams V3.2 was released on October 25th.  This post gives an overview of some of the highlights of the release and will be followed by more detailed articles and posts in the future.

Streams Studio on Windows

InfoSphere Streams is a highly scalable distributed system that runs on Linux servers but most developers have Windows machines for their day to day work.  Starting in V3.2 Streams Studio supports remote development on Windows where you can have the full functionality of Streams Studio on Windows.

Streams Studio allows you to configure a remote workspace on the Streams Linux servers which shadows local projects on their Windows machine.  You can develop applications and toolkits on your windows machine.  These applications and toolkits can be built and run on the remote Streams Linux servers.  You can also manage and monitor the streams instances and applications running on the Streams Linux servers from Streams Studio on Windows.  You can find out more about remote development with Streams Studio in the Streams V3.2 Info Center.

Business Rules Integration

The new Rules toolkit works with WebSphere Operation Decision Manager (ODM) to execute business rules as part of a streams application flow.  ODM provides a rules language for easily describing business rules and a rich set of tooling to create and manage the rules.  Streams provides a scalable distributed platform for rich analytics on streams of big data which can have massive volumes, lots of variety and high velocity.  The new Streams rules toolkit allows business rules to be applied to Big Data with the rich language and tooling of ODM and the scaling and full context of many data sources and rich analytics of Streams. The diagram below illustrates the rules being authored, managed and then deployed with ODM and executed in a streams operator.

The toolkit provide a rules execution operator which applies rules to tuples on the input stream and submits tuples to output streams based on the result of the rules.  It supports file and database deployment of rules and the rules can be refreshed dynamically without having to restart the Streams application.  You can find out more about the new Rules Toolkit in the V3.2 Info Center.

User Defined Parallelism

The Streams programming model allows highly scalable parallel processing.   A common application pattern has been to replicate a region of an application graph and distribute a stream across those replicated regions to process the stream in parallel and then rejoin the stream.  This could either be done manually with a split operator or using mixed mode PERL to generate the replicated regions.  Unfortunately with these techniques the number of parallel channels was fixed at compile time and the streams studio graphical editor could not be used with mixed mode PERL.

In Streams V3.2 you can specify parallel regions in an application using a simple annotation and choose the number of parallel channels at submission time or compile time.

This “@parallel” annotation allows you to specify the number of parallel channels (width) at compile time or submission time, how input streams should be partitioned across the parallel channels, and the placement constraints for each parallel channel.

A Streams application developer can focus first on defining the logical application flow.  Then parallel regions can be added using the simple annotation.   The actual number of channels to run in parallel can be specified when the application is launched, based on resources.  At Job submission time the logical topology is expanded to a physical topology with input and output Streams automatically split and merged as needed.

An application can contain multiple parallel regions and a parallel region can include sources and sinks, imports and exports, even a whole application.  Operators can determine which parallel channel they are in at runtime.  Metrics, Views and charts can all be viewed for the aggregated logical flow or each separate channel and the Streams Studio instance graphs can show the logical single flow or the physical parallel channels.  You can find out more about developing applications using User Defined Parallelism in the V3.2 Info Center

Enhanced Java Support

You have been able to write primitive operators for Streams in Java for some time.  This allows you to expose specialized functionality written in java as an operator that can be used by SPL developers in their streams applications.  This involved writing the Java code for the operator and creating the operator model in a separate xml file.  In InfoSphere Streams V3.2 we have added the ability to create the operator model using an annotation in your Java code.

public class TestOperator extends AbstractOperator {}

There are annotations for input ports, output ports, parameters and metrics.  The operator model is generated from the annotation and allows developers to create an SPL primitive operator completely in Java. You can find out more about annotations to define an SPL Java primitive operator in the V3.2 Info Center.

We have also added the ability for SPL native functions to be implemented in Java.  Again these can be created using a simple annotation in the Java code.

public static int add(int a, int b) {return a+b;} 

This allows you to quickly build up a library of useful java functions that can be called directly from SPL. You can find out more about creating Java native functions in the V3.2 Info Center.

Support for MQTT Messaging

MQTT is a machine-to-machine (M2M) and Internet of Things (IoT) connectivity protocol designed as an extremely lightweight publish and subscribe messaging transport. It is useful for connections with remote locations where a small code footprint is required or network bandwidth is at a premium, such as mobile and sensors like automotive telematics applications.

In V3.2 we have added MQTTSource and MQTTSink operators to the Messaging Toolkit to allow Streams applications to receive or send messages with MQTT.   You can find out more about the Messaging Toolkit in the V3.2 Info Center.

Pre-product versions of these operators posted on Streams Exchange were used to create the IBM MobileFirst Connected Car demo which can bee seen on You Tube.  This demonstrates how Streams and MQTT can be used together to allow huge numbers of cars to give constant updates on their status and local conditions, analyze that information in send real time notifications of changing conditions or alerts.  While this demo is based around research work with auto manufacturers, it can apply to a variety of scenarios where sensors and devices are producing a huge amount of information that Streams can analyze and take action on in real-time.

Visualisation Improvements

Streams V3.0 added support for visualisation with Views and Charts.  These are useful functionality but it could be hard to create charts in the Streams Console or to access views from other systems.  In V3.2 we have made creating charts easier in the Streams Console and added a REST APIs to list available views and to get stream data from views.  You can now create a chart on an operators output port by right clicking on it in the Streams Console application graph and configuring the chart.

The Chart configuration now includes a preview of the chart based on the current settings so its easier to see how the chart will look as it is configured. You can find out more about viewing streaming data in charts and tables by using the Streams Console in the V3.2 Info Center. You can find out more about the Streams REST API in the V3.2 Info Center.

Support for InfoSphere BigInsights BigSQL

The Streams Database toolkit now supports Big SQL, IBM’s SQL interface to its Hadoop-based platform, InfoSphere BigInsights. Big SQL is designed to provide SQL developers with an easy on-ramp for querying data managed by Hadoop. It enables data administrators to create new tables for data stored in Hive, HBase, or their BigInsights distributed file system.¬† The Streams Database toolkit ODBCSource, ODBCAppend, ODBCRun and ODBCEnrich operators support BigSQL to read and write this distributed data. You can find out more about the Database Toolkit in the V3.2 Info Center.

Enhanced TimeSeries Toolkit

The TimeSeries Toolkit has been enhanced in V3.2 with new operators (AutoForecaster, PSAX and Generate) and adding a control port to the GAMLearner and HoltsWinters operators.   The AutoForecaster operator automatically detects the best forecasting algorithm (ARIMA or Holtwinters) for the current input data and applies it in real time. The Generate operator can generate data of various types (sine wave, triangular wave, and more) to use as input for time series operators and applications. The PSAX Operator can compress input timeseries using Piecewise and Symbolic Aggregate Approximation (PSAX) algorithms.  You can find out more about the TimeSeries Toolkit in the V3.2 Info Center.

1 comment on"Whats new in InfoSphere Streams V3.2"

  1. Juan Camilo May 28, 2014

    Thank you for sharing.

Join The Discussion