This document summarizes the toolkits that are shipped as part of the Streams product. In the Streams v4.1 product, the following toolkits are included as part of the product:

HBase Toolkit (

The HBase toolkit provides support for interacting with Apache HBase systems from InfoSphere Systems.  We provide support for reading and writing to HBase.

This toolkit is developed as part of the Github open-source project:

HDFS Toolkit (

The HDFS Toolkit provides operators that can read and write data from Hadoop Distributed File System (HDFS)

This toolkit is developed as part of the Github open-soruce project:

Inet Toolkit (

The Inet toolkit provides support for common internet protocols.  This tookit contains a subset of functions from the Inet Toolkit Project from Github.

See this link:

Messaging Toolkit (

The Messaging Toolkit provides operators that allow your Streams application to read and send messages to popular messaging systems, like Kafka, MQTT, Websphere MQ and Apache MQ.

This toolkit is developed as part of the Github open-source project:

Complex Event Processing Toolkit (

This toolkit provides the ‘MatchRegex’ operator to perform complext event processing in Streams.  This allows you to use patterns, expressed as regular expressions, to detect composite events in a streams of tuples.

Data Explorer Toolkit (

This toolkit provides an operator that enables streams processing applications to insert data into IBm InfoSphere Data Explorer.

Database Toolkit (

This toolkits provides a set of operators that enable your Streams application to integrate with a wide range of external data systems.  Some of the supported data systems include: IBM DB2, IBM InfoSphere BigInsights, IBM Netezza, Oracle Database, Teradata Database, etc.  This toolkit provides adapters where you can read and write from the supported data systems.  You can also run SQL queries from these systems.

Financial Services Toolkit (

The Financial Services Toolkit provides pre-built solution models, finance-specific edge adapters, and operators to reduce the time and effort needed to develop a wide variety of financial applications.

Geospatial Toolkit (

The Geospatial Toolkit includes operators and functions that facilitate efficient processing and indexing of location data.  The toolkit provides a set of operators that helps you easily perform tasks like geofencing, hangout.  It also provides a rich set of functions for location data manipulation, determining distances, figuring out if geometries intersect, etc.

 Mining Toolkit (

The Mining Toolkit include operators that you can use to mine data streams by applying modes.  Stream mining requires applying models that were learned from history to streaming data in order to detect patterns of interest in realtime.  The Mining Toolkit supports scoring complex models against realtime data using the Predictive Model Markup Language (PMML) standard.

R-Project Toolkit (

The R-project Toolkit includes an RScript operator, that you can use to run R commands and apply complex data mining algorithms to detect patterns of interest in data streams.  The R scripts can also be updated dynamically on demand, without having to restart your Streams application.

Rules Toolkit (

The Rules toolkit integrates with IBM Operational Decision Manager (ODM).  It allows you to execute business rules against streaming data, thereby allowing you to make critical business decisions in realtime.

For more information about use cases and how Streams and ODM can work together, refer to this technote:  Patterns for IBM Operational Decision Manger in Big Data Streams

Telecommunications Event Data Analytics Toolkit (

The Telecommunications Event Data Analytics toolkit provides a set of generic operators that are used in telecommunications applications, and it also provides an application framework that enables you to set up new file-to-file applications. These applications are based on code templates and support customization, configurable parallel processing, graceful application shutdown, and reliable file processing.

Text Toolkit (

The Text Toolkit provides an operator that helps you extract and analyse information from text data.  The Text Toolkit integrates the Text Analytics component from IBM InfoSphere BigInsights.  By using the Text Toolkit, a streams processing application can read text data and derive structured information that is based on various rules. These rules are defined in extractors, which are programs that extract information from within a text field. Extractors are written in AQL.

Time Series Toolkit (

This toolkit provides a rich set of operators and functions for you to process and analyse time series data.  A time series is a sequence of numerical data that represents the value of an object or multiple objects over time.  With the time series toolkit, you can read, repair or condition a time series in real time.  You can also perform statistical analysis, correlations, decomposition and transformation of your time series data.  Last but not least, we provide a set of data modeling operators, that allows you to create statistical models for prediction and regression analysis.

Cybersecurity Toolkit (

The Cybersecurity toolkit can detect active threats occurring within a network in real-time. The toolkit uses machine learning models to analyze and score networking traffic. The toolkit contains analytics that can build profiles of either domains or hosts within a network and report on suspicious behaviour in real-time. Furthermore, the toolkit can predict whether a domain should be added to a blacklist. This analytics provided as part of the toolkit can be used in tandem with existing forensic security software to enable both online and offline cybersecurity analysis.

Spark MLLib Toolkit (

The Spark MLLib toolkit can load machine learning models saved using Apache Spark’s MLLib library and use them for scoring real time data in IBM Streams. The toolkit supports a number of machine learning algorithms including collaborative filtering, K-means clustering, decision trees, regression models such as isotonic, logistic and linear regressions and others.

This toolkit is developed as part of the Github open-source project:

DPS Toolkit (

The Distributed Process Store (DPS) toolkit provides a collection of APIs that allows non-fused SPL, Java and C++ operators to share data by using an external NoSQL key-value store such as Redis.

This toolkit is developed as part of the Github open-source project:

Join The Discussion