The Streams Quick Start Guide is intended to help you get up and running with InfoSphere Streams quickly.  We will first introduce the basic concepts and building blocks.  And then we will write a very simple Streams application, and demonstrate how you can run and monitor this application in the Streams distributed runtime environment.

Overview

IBM® InfoSphere® Streams is an advanced analytic platform that allows user-developed applications to quickly ingest, analyze and correlate information as it arrives from thousands of real-time sources. Streams can handle very high data throughput rates, millions of events or messages per second.

Here are some of the key highlights of Streams:

  • SPEED – Streams can process data and provide actionable insight with sub-second latency.  Data is processed in memory as it arrives.  There is no need to store data.
  • EFFICIENT – Not only is Streams really fast, it is very efficient.  It can handle extremely high volume of data flow, with much less hardware requirement.
  • INTEGRATE – Streams can easily integrate with many external systems.  We support common storage systems like HDFS, HBase, DB and Files. In addition, we also support some of the most popular messaging systems like Kafka, MQTT and ActiveMQ.
  • FLEXIBLE – Streams can analyze any kind of data.  Streams can analyze structured data like csv, CDRs, XML, JSON.  It can also process unstructured data like audio, video, and text from social media site.
  • ADVANCED ANALYTICS – Streams provides a rich set of advanced analytics like geospatial and time series analysis.  It can also be integrated with popular data analytic tools like R and SPSS.
  • EXTENSIBLE – Streams comes with a programming framework that allows you to implement your own adapters to external systems or analytic functions in Java or C++.
  • RESILIENT – Streams distributed runtime is designed to be extremely resilient.  In the case of any system failure, the runtime can recover automatically without any human intervention.  Streams also has an application framework that allows Streams applications to be resilient, ensuring that no data is lost in the case of application or system failures. This is all done with minimal impact to application performance.
  • TOOLING – Streams is shipped with an integrated development environment – Streams Studio.  Streams Studio allows you to write, run, and debug your Streams application easily, all with a graphical interface.  The Streams Console is a web-based dashboard that allows Streams administrator to easily manage a Streams cluster.

Basic Building Blocks

To write a Streams application, you need to first understand the basic building blocks.

buildingBlock

The fundamental building block of a Streams application is an operator.  A stream of continuous records (messages, events, tuples) flows into the input port of an operator.  An input port is a port where an operator can consume data.  An operator can have 1 or more input ports.  The operator processes the records in memory and produces a new stream of records as output.  The new stream of data is emitted from the output port of the operator.  An output port is a port where the operator produces a stream of data.  An operator can have 1 or more output ports.

A Streams application is a directed flow graph of operators.  The diagram below is an example of a Streams application.

buildingBlockApp

 

  • Source Adapter – At the beginning of the application is a source adapter.   A source adapter is an operator that reads data from external systems and produces a stream as its output.
  • Stream  – A stream is an infinite sequence of records to be processed.
  • Tuple – A tuple represents individual records of the streaming data.  It is a structured list of attributes and their data types.
  • Schema – A schema is a specification of data types and attributes in a tuple.
  • Operator – An operator processes the stream from its upstream operator and produces a new output stream. The operator can process data on a tuple-by-tuple basis.  It can also operate on a window of data.  When processing is complete, the operator submits the result as a new tuple to its downstream operator. This process continues until it reaches the sink adapter.  An operator can perform any kind of processing on the data.  For example, it can perform simple processing like filtering unwanted data, transforming data from one format to another.  It can also be advanced analytic like predictive forecasting, data mining, text analytic, and geofencing.
  • Window – A finite sequence of tuples that the operator keeps in memory. A window is useful for processing real-time data in batches.  For example, for a telecommunication company, it may be interested in monitoring number of drop calls every 10 min for early detection of severe network failures.  A single drop call may not indicate a problem with the network.  But a large number of drop calls in a 10 min window may signal that something is seriously wrong.  In their Streams application, they can set up a window of drop call records to be processed every 10 mins.  Once the time expired, the drop call records can be analyzed.  And if the result indicates a significant problem, the company can act in a timely manner.
  • Sink Adapter – At the end of the application is a sink adapter.  A sink adapter is an operator that writes data to external systems and does not produce a new stream.

TIP:   At this point, you may wonder where you can find the operators to write your Streams application and if you need to write them from scratch.  Streams ships with many operators out-of the box.  The SPL Standard Toolkit provides basic data processing functions like filtering, aggregation, sorting, etc.  In addition, Streams is shipped many specialized toolkits for working with external systems and advanced real-time analytics.

SPL Basics

The Streams Processing Language (SPL) is a distributed data flow composition language for writing a Streams application.  It is designed to help you describe the directed flow graph; define the schema on each of the streams; and allow you to easily customize the behavior of each of the operators in the graph.  This section is intended to help you get a high-level understanding of the SPL language.

TIP:  Streams Studio ships with SPL Graphical Editor for constructing Streams applications.  You can simply drag and drop operators and connect them up in the SPL Graphical Editor.  The editor supports the full SPL language specification and you do not have to manually write SPL code.  In addition, Streams Studio also provides an SPL editor.  The SPL Editor provides syntax highlighting, content assist and code completion.  It is a good tool to learn about the various SPL language features and functions.

Main Composite

The entry point of a Streams application is a main composite.  A composite operator is a graph of operators.  A main composite is a special composite operator that has no input port and output port.  Think of this as the main method from a Java / C++ program.  A main composite is represented in SPL as follows:

// Main composite: SampleMain
// This special composite has no input port and no output port.
composite SampleMain
{
}

To write a directed flow graph in the main composite, we add a graph clause into the main composite.  A graph clause is a section in the composite that describes a list of operators in the graph, how the operators are connected, and what data is sent between the operators.

splOverivew1

 

The diagram above is an example of a graph clause in the main composite.  Each of the operators is represented by an operator invocation block in SPL.

splOverivew2

An operator invocation starts by defining the schema and the name of the output stream that the operator is going to produce.  In the example above, the name of the stream for Operator1, is Operator1_output.  The stream is defined with a schema of <rstring name>.

Next, the operator invocation block defines the operator to call.  In the example above, we are invoking an operator named Functor from the SPL Standard Toolkit.

Finally, an operator invocation block defines an input stream feeding into the operator.  This is done by specifying the name of the output stream of another operator.  In the example above, Operator1 consumes data from a stream named FileSrcOutput.

You may customize the behavior of an operator by writing additional clauses in an operator invocation:

 (stream<rstring name> Operator2_output) as Operator2 = Operator(Operator1_output)
 {
    logic
       onTuple FileSrcOutput :
      {
        printStringLn(FileSrcOutput.name) ;
      }

    window
       inputStream : tumbling, count(5) ;
    param
       param1 : "param1Val";
       param2 : "param2Val";
    output
       Operator1_output : name = name + " newName" ;
    config
       placement : partitionColocation("pe1") ;
 }

The clauses must be specified in the order as shown.

  • logic –  The logic clause is called each time the operator receives a tuple.  You may write custom SPL code here to process the input tuple.
  • window  The window clause allows you specify the kind of window the operator should use, and its trigger and eviction policy.
  • params – Each of the operators define a set of parameters to help you customize its behavior.  The params clause allows you to specify the values of the parameters.
  • output  The output clause allows you to customize attribute assignments of the output tuple.  By default, if the input stream and the output stream contains an attribute of the same name and same type, the attribute value of the input stream will be automatically assigned to the same attribute on the output stream.  In addition, you may customize output attribute assignment by specifying custom assignment expressions in the output clause.  If the output stream contains an attribute that is not present in the input stream, you  must specify an assignment for that attribute in the output clause.
  • config – The config clause allows you to specify various operator / process configurations supported by Streams.  For example, you may use the config clause to control if two operators should run in the same process.  You may also use the config clause to control which resource / host to run the process on.

This is a very high-level overview of the SPL language.  For more details, refer to the SPL Programming Reference from the Knowledge Center.

 

Simple Streams Application

Now that we have gone through the basics, we are going to write a simple application that covers these concepts.

Our first application will process stock trades from a CSV file.  The application will filter out some of the stocks based on ticker names.  It will then calculate the average, maximum and minimum ask price for each of the stocks.  The results will be written to a file.

You may find this example in the Samples repository from Github. TradeApp Sample on Github

To start writing a Streams application, you must have an understanding of the data that you are trying to process.  The csv file that we are going to process looks like this.

#ticker, date, time, askprice
"GLD","27-DEC-2005","14:06:09.854",50.7
"IYF","27-DEC-2005","14:12:38.019",103.69
"IOO","27-DEC-2005","14:13:20.873",64.02
"AU","27-DEC-2005","14:13:32.877",49
"CAJ","27-DEC-2005","14:14:17.938",60
"EWZ","27-DEC-2005","14:14:46.039",33.25

The CSV file has the following format:

  • Ticker – a string that represents the name of the stock ticker
  • Date – a string that represents the date of the stock quote
  • Time – a string that represents the time of the stock quote
  • Ask Price – the ask price of the stock at the specified time

To create this application, we will first create a main composite named TradesAppMain:

* Main composite for stock processing application
 * 
 */
composite TradesAppMain {
}

To read data from a file, you need to add the FileSource operator into the graph.  FileSource is an operator from the spl.adapter namespace in the SPL Standard Toolkit.

sampleApp0

/** Main composite for stock processing application
 * 
 */
composite TradesAppMain
{
    graph
         (stream<rstring ticker, rstring date, rstring time, float64 askprice> TradeQuotes) as TradeQuoteSrc = FileSource()
         {
             param
                 file: "/home/myuserid/data/trades.csv";
                 format: csv;
         }
}
  • In this FileSource invocation, we have defined the name of the output stream as TradeQuotes.
  • The schema on the output stream contains 4 attributes:  ticker, date, time and askprice.  Please note that this schema must match the format as specified in the CSV file for the operator to be able to parse the content of the file.
  • In the param clause, specify the location of the file using the file parameter.
  • To parse the input file as comma-separated-values, assign csv as the value for the format parameter.

This is all it takes to read a CSV file for processing.

Next, we are going to filter out some of the stocks based on its ticker by adding a Filter operator into the graph.

sampleApp1

 /** Main composite for stock processing application
  * 
 */
composite TradesAppMain
{
    graph
         (stream<rstring ticker, rstring date, rstring time, float64 askprice> TradeQuotes) as TradeQuoteSrc = FileSource()
         {
             param
                 file: "/home/myuserid/data/trades.csv";
                 format: csv;
         }

         (stream<rstring ticker, rstring date, rstring time, float64 askprice> FilteredTradeQuotes) 
         as FilteredTrade = Filter(TradeQuotes)
         {
             param
                 filter: ticker != "GLD";
         }
}
  • The Filter operator consumes the TradeQuotes stream from the FileSource operator.
  • The Filter operator also produces a stream named FilteredTradeQuotes.  
  • The output stream schema is the same as the input stream.  Therefore, we do not need an output clause to perform manual output attribute assignment.  If a tuple is allowed to pass through in the Filter operator, attributes from the input stream will automatically be assigned to the same attributes on the output stream.
  • The Filter operator defines a parameter named filter.  This parameter specifies the condition to allow a tuple to pass through the filter.  In this case, if the ticker name is not GLD, then the tuple will be allowed to get through the Filter.

Next, we will add an Aggregate operator to calculate the min, max and average askprice of the stocks.

sampleApp2

 (stream<rstring ticker, float64 min, float64 max, float64 average>
 TradesSummary) as AggregateTrades = Aggregate(FilteredTradeQuotes as inPort0Alias)
 {
      window
          inPort0Alias : tumbling, count(5), partitioned ;
      param
          partitionBy : ticker ;
      output
          TradesSummary : min = Min(askprice), max = Max(askprice), average =Average(askprice) ;
 }

The Aggregate operator consumes data from the FilteredTradeQuotes stream and produces a stream named TradesSummary. 

The Aggregate operator operates on a window of data.  We have set up a tumbling window with a count of 5.  The window is partitioned by ticker.  With a partitioned window, the operator maintains a separate window for each of the stocks it encounters.  For example, the operator maintains a window of 5 tuples for IBM stock.  At the same time, a separate window is maintained for the last 5 stock prices from Apple.

When the window is filled with 5 tuples (i.e. the trigger policy is met), the operator looks at the output functions specified in the Output clause.  In this case, the user wants the operator to calculate the min, max and average askprice of the tuples in the window.  The operator runs the output functions ( Min, Max and Average) and assign the results to the output attributes as specified.  Since this is a tumbling window, the window will be emptied once the trigger policy is met.  

The SPL language has two kinds of windows, tumbling and sliding. They both store tuples while they preserve the order of arrival, but differ in how they handle tuple evictions. Rather than keeping all the tuples ever inserted, windows are configured to evict expired tuples. In this respect, tumbling windows operate in batches. When a tumbling window fills up, all the tuples in the window are evicted. This process is called a window flush. Conversely, sliding windows operate in an incremental fashion. When a sliding window fills up, the future tuple insertions result in evicting the oldest tuples in the window. The details of tuple eviction are defined by the eviction policy.

Next, we will add a Custom operator to do some special processing with the TradesSummary data.  The Custom operator is a special logic-related operator that can receive and send to any number of streams and does not do anything by itself. Thus, it offers a blank slate for customization in SPL.

sampleApp3

(stream<rstring ticker, float64 min, float64 max, float64 average>
 CheckedTradesSummary) as CustomProcess = Custom(TradesSummary)
{
     logic
         onTuple TradesSummary: {
             if (average == 0.0l)
             {
                  printStringLn("ERROR: " + ticker);
             }
             else 
            {
                  submit(TradesSummary, CheckedTradesSummary);
            }
        }
 }

To write custom SPL logic, we add a logic clause in the Custom operator.  On each tuple received by the operator, the logic clause is executed.  In our logic clause, we check the average ask price for the stock.  We flag an error with the data if the average ask price is zero and will not submit the tuple to the output port.  If the average ask prices is greater than zero, then we submit the tuple to the CheckedTradesSummary output stream.

Finally, we add a FileSink operator to write our analysis results to a file.

sampleApp4

 () as CheckedTradesSummaryFile = FileSink(CheckedTradesSummary)
 {
     param
          file : "/homes/myuserid/data/tradesSummary.csv" ;
          flush : 1u ;
          format : csv ;
 }

The FileSink consumes the output stream, CheckedTradesSummary,  from the Custom operator.  The operator writes output to a file named, “/homes/myuserid/data/tradesSummary.csv”.  The flush parameter controls how often the operator will flush content to the file system.  In this case, the operator will flush after each tuple is received.  The format parameter tells the operator the format to write the data in. In our example, the output file will be written in csv format.

Here’s the source code for this application:

 /** Main composite for stock processing application
  *  
  */
composite TradesAppMain
{
    graph
         (stream<rstring ticker, rstring date, rstring time, float64 askprice> TradeQuotes) 
         as TradeQuoteSrc = FileSource()
         {
             param
                 file: "/home/myuserid/data/trades.csv";
                 format: csv;
         }

         (stream<rstring ticker, rstring date, rstring time, float64 askprice> FilteredTradeQuotes) 
         as FilteredTrade = Filter(TradeQuotes)
         {
             param
                 filter: ticker != "GLD";
         }


        (stream<rstring ticker, float64 min, float64 max, float64 average>
         TradesSummary) as AggregateTrades = Aggregate(FilteredTradeQuotes as inPort0Alias)
        {
            window
                inPort0Alias : tumbling, count(5), partitioned ;
            param
                partitionBy : ticker ;
            output
                TradesSummary : average =Average(askprice), min = Min(askprice), max = Max(askprice) ;
        }

        (stream<rstring ticker, float64 min, float64 max, float64 average>
         CheckedTradesSummary) as CustomProcess = Custom(TradesSummary)
        {
            logic
                onTuple TradesSummary: {
                    if (average == 0.0l)
                    {
                       printStringLn("ERROR: " + ticker);
                    }
                    else 
                    {
                       submit(TradesSummary, CheckedTradesSummary);
                    }
                }
         }

         () as CheckedTradesSummaryFile = FileSink(CheckedTradesSummary)
         {
            param
               file : "/homes/myuserid/data/tradesSummary.csv" ;
               flush : 1u ;
               format : csv ;
         }
}

Streams Application Pattern

The sample Streams application demonstrates a common Streams application pattern.  Most applications follow this pattern described as follows.

Data is ingested from various data sources.  As data is flowing through the application, it is prepared, analyzed and processed in memory. You can optionally store the data into a data store for record keeping or deeper analysis at any stage of the application. With the exception of the Ingest stage, all stages in this application pattern are optional.

appPattern2

  • Ingest: At the beginning of all Streams applications is the Ingest stage.  In this stage, the application consumes continuous live data from disparate data sources.  A data source can be a machine sensor, live feeds from social media sites, databases, file system, HDFS, etc.  Streams provides a set of adapters to help ingest data from different kinds of data sources.  Streams can handle any kind of data, both structured and unstructured.  Examples of structured data include XML, JSON, CDRs, etc.  Examples of unstructured data include unstructured text, voice, videos, signals, etc.  As soon as data is ingested into the application, the data can be manipulated and analyzed in memory as it is flowing through the application.
  • Prepare:  In this stage, the application can parse, transform, filter, clean, aggregate, or enrich the data in memory, preparing the data for real-time analytics.  For example, if your application is ingesting videos from security cameras, you may need to process and parse the videos, and then convert them into a form that can be analyzed in the following stages.  Another example is that if you are reading data from a message server in JSON format, you may need to parse the JSON text.  At this stage, you may also correlate data from the different data sources, and enrich the data for analysis.
  • Detect and Predict:  This is the stage where the application performs real-time analysis of the data, and for the application to gain insight into the data that is coming through.  Streams provides a set of toolkits for data analysis.  For example, Streams provides a timeseries toolkit for modeling, anomaly detection, and short-term and long-term forecasting.  The SPSS toolkit allows a Streams application to perform real-time scoring against a pre-built SPSS model.  The R toolkit allows  you to analyze your data using R with your existing R scripts.  There are many more toolkits available for analysis.  If you need to run any specialized analysis that is not already available in one of the toolkits, you may write your own toolkits and operators to analyze the data.
  • Decide:  In this stage, you use the insight gathered in the previous stage, and create logic to decide how to act on that insight.  In addition, Streams can integrate with Business Rules software, like Operational Decision Manager (ODM).  You may run the business rules against the flowing data, allowing you to make critical business decisions in real-time.
  • Act:  In this stage, we act on the decision made from the previous stage.  You may send the analysis result to a data visualization server.  You may decide to send an alert to someone about the anomaly detected in the data.  You may publish the results to a list of subscribers.

Building Streams Applications

In the following sections, we are going to discuss how you can build your Streams application, and submit it to the Streams distributed runtime.

Before you begin:  Set up the necessary environment variables by running the following command:

source <Streams_Install>/bin/streamsprofile.sh

There are two ways to build a Streams application: standalone or distributed.

Standalone

Standalone mode is good for testing and debugging.  In this mode, the application is built into a single program.  You may run this program without submitting the application to a Streams Instance (the Streams distributed runtime).

To build the sample application in standalone mode, use the command below:

/opt/ibm/InfoSphere_Streams/4.0.0.0/bin/sc -M application::TradesAppMain --output-directory=output/application.TradesAppMain/Standalone --data-directory=data -T -a
  • The -M option specifies the name of the main composite to build
  • The –output-directory option specifies where to write the build output.
  • The –data-directory option specifies the data directory of the application.  A data directory is an optional directory where data may be read or written for your application.
  • The -T option indicates that the application should be compiled into standalone mode.
  • The -a option specifies that the application should be compiled in optimized mode.

To run this application in standalone mode, run this command:

<ApplicationOutputDir>/bin/standalone

Distributed

To build the sample application in distributed mode, simply remove the -T option as follows:

/opt/ibm/InfoSphere_Streams/4.0.0.0/bin/sc -M application::TradesAppMain --output-directory=output/application.TradesAppMain/Distributed --data-directory=/homes/myuserid/data -a

In distributed mode, the application is built into an application bundle (*.sab) in the output directory.  The application bundle contains all file resources, libraries and dependencies required for running the application in distributed mode.  To run the application in distributed mode, you need to submit this application bundle to a Streams Instance.

Streams Domain and Instance

To run your application in distributed mode, you need a Streams Domain and a Streams Instance.  A domain is a logical grouping of resources (or containers) in a network for common management and administration.  It can contain one or more instances that share a security model, and a set of domain services.

An instance is the Streams distributed runtime environment.  It is composed of a set of interacting services running across one or multiple resources.  The Streams Instance is responsible for running Streams applications.  When an application is submitted onto a Streams instance, it distributes the application code onto each of the resources.  It coordinates with the instance services to execute the processing elements.

Setting up a Development Domain and Instance

To set up a development domain and instance, follow these steps.  A development domain and instance runs on a single host.  You can dynamically add additional host to the domain later.

First, if you have not already done so, set up the necessary environment variables by running the streamsprofile.sh.

source <Streams_Install>/bin/streamsprofile.sh

Next, start streamstool by typing the following command:

streamtool

When prompted to provide a ZooKeeper ensemble, enter the ZooKeeper ensemble string if you have a ZooKeeper server set up.  Otherwise, press enter to use the embedded ZooKeeper.

streamtool is an interactive tool.  To get content assist and auto-complete,  press <Tab>.

Domain

To make a new domain, enter this command in the streamtool interactive command session:

mkdomain -d <domainName>

Generate public and private key for Streams, so you do not have to keep logging in:

genkey

Start the domain:

startdomain

Tip:  If the domain fails to start because a port is in use, you may change the port number by using setdomainproperty.  For example, if JMX and SWS ports are in use:

setdomainproperty jmx.port=<jmxPort> sws.port=<sws.Port>

Instance

To make a new instance, enter this command:

mkinstance -i <instance-id>

Start the instance:

startinstance

For information about domain and instance set up, refer to the following documentation:

Running Streams Applications in Distributed Mode

Now that you have a domain and instance started, you can run your application in distributed mode.  To submit a job, find the application bundle file (*.sab), and run the following command:

streamtool submitjob appBundleName.sab

To submit our sample application, we will change into the output directory of the application and submit the application bundle:

cd output/application.TradesAppMain/
streamtool submitjob application.TradesAppMain.sab

Querying for Job Status

You may query job status from your Streams Instance using streamtool commands.

If using embedded ZooKeeper:

streamtool lsjob -d <streamsDomainName> -i <instanceName> --embeddedzk

If using external ZooKeeper ensemble:

streamtool lsjob -d <streamsDomainName> -i <instanceName> --zkconnect <zooKeeperHost>:<zooKeeperPort>

You will see job status similar to this:

[streamsadmin@streamsqse Distributed]$ streamtool lsjobs -d StreamsDomain -i StreamsInstance --embeddedzk
Instance: StreamsInstance
 Id State Healthy User Date Name Group
 6 Running yes streamsadmin 2015-04-30T18:32:48-0400 application::TradesAppMain_6 default

Sample Application Output

To see the result of the sample application, open the file as specified in the file parameter of the FileSink operator:

() as CheckedTradesSummaryFile = FileSink(CheckedTradesSummary)
{
    param
        file : "/homes/myuserid/data/tradesSummary.csv" ;
        flush : 1u ;
        format : csv ;
}

You will find entries like this in the file.  The first string is the ticker name of the stock that you have done this analysis for.  The first number is the minimum ask price of the stock.  The second number is the maximum ask price of the stock.  The last number is the average ask price of the stock.  The summary result is based on the last 5 records that the application has received about the stock.

"TWX",0,17.7,10.62
"FSL",0,26.19,15.62
"FSLb",0,26.29,20.928
"BK",0,32.53,26.024
"NFJ",0,21.59,17.26
"RIO",0,41.15,32.872
"NLY",0,11.23,8.97
"BGG",0,39.81,31.848

Streams Console

In addition to querying for job status using streamstool, you may also look at your job status and monitor the health of your Streams cluster using the Streams Console.  Streams Console is a web-based admin console that allows you to monitor and administer your Streams Domain.

To find the URL of the Streams Console, run the following command:

[streamsadmin@streamsqse Distributed]$ streamtool geturl
https://streamsqse.localdomain:8443/streams/domain/console

Open this URL in a browser, and you will see a log-in screen. If you are using the Streams Quick Start Edition VM, enter the following user ID and password:  streamsadmin:passw0rd

You will then see a dashboard like this:

quikStartConsole

For more information about the Streams Console, see the following:

 Streams Studio

Streams provides an Eclipse-based  integrated development environment (IDE) for you to develop your Streams applications.  You may view and monitor your running applications in the Streams Studio Instance Graph:

quickStartInstanceGraph

In addition, Streams Studio provides the following features:

  • Streams Explorer View – to help you manage your Streams Development Environment
  • Project Explorer build and launch support – to help you build and launch your Streams application
  • SPL Graphical Editor – to allow you to create your Streams application using a graphical interface, without having to know the details of the SPL language
  • SPL Editor – to help you write SPL code, providing you with support like syntax highlight, content assist, refactoring.
  • Instance Graph – to help you visualize, debug and monitor your running applications.
  • Metrics View – to help you debug and monitor your running applications, by looking at the metrics of your operators and processing elements

To get started with Streams Studio, try out the Streams Studio Quick Start Guide.

For more information about Streams Studio, refer to the following:  Streams Studio Overview

What’s Next?

Learn by doing:

Learn more about Streams:

7 comments on"Streams Quick Start Guide"

  1. I’m trying to setup a development domain and instance on vmware-streamsV4.1.0-qse-v1:

    /opt/ibm/InfoSphere_Streams/4.1.0.0/bin/streamsprofile.sh
    [streamsadmin@streamsqse ~]$ source /opt/ibm/InfoSphere_Streams/4.1.0.0/bin/streamsprofile.sh
    InfoSphere Streams environment variables have been set.
    [streamsadmin@streamsqse ~]$ streamtool
    [streamtool ] mkdomain -d mydomain
    CDISA5252E Invalid value for bootstrap property ‘streams.zookeeper.quorum’. For embedded ZooKeeper the value must specify the hostname for the local host. Found: streamsqse.localdomain. Use ‘streamtool setbootproperty’ to update the property value.
    [streamtool ] setbootproperty streams.zookeeper.quorum=streamsqse.localdomain
    CDISA5193E Cannot set Streams bootstrap configuration properties while embedded ZooKeeper is running.
    [streamtool ]

    • Walt Madden January 04, 2016

      Have you perhaps changed the hostname of the VM to something other than streamsqse.localdomain? Does hostname –fqdn still return “streamsqse.localdomain”? Also, can you confirm you do not have environment variable STREAMS_ZKCONNECT set?

      Most problems we see with the VM involve the I/P address of the VM changing after the VM was booted due a change in the host I/P network. So, you could also confirm the VM I/P address returned by “ifconfig” matches the entry entry in /etc/hosts for streamsqse.localdomain. If there is a mismatch, rebooting the VM should correct – a script runs at boot time to update /etc/hosts to match the current I/P of the VM.

      Hope this helps – let us know!

  2. This was very helpful, thank you!

  3. Thank you very much Samantha and the guide is very useful!

  4. Great intro, thank you very much!

Join The Discussion