READ ME FIRST

This development guide is only meant to be used with Streaming Analytics instances in IBM Cloud created using a V2 price plan. If the instance that you are working with was created under another price plan, please consult our V1 development guide.

If you’re not sure which type of price plan you are using, see V1 and V2 Service Plans.

This development guide applies to Streams applications that are compiled in the Streams Studio IDE or compiled by you using the sc command. It does not apply to applications where the compilation is performed for you in the cloud by other development tools, such as:

  • Streams Flows (in Watson Studio)
  • IBM Streams Runner for Apache Beam
  • Python/DSX notebooks (without a local Streams install)

If you are using Streaming Analytics via one of the three development tools listed above, please refer to the documentation associated with those tools as your development guide.

Introduction

Streaming Analytics is an IBM Cloud service built upon the IBM Streams technology. Streams is an advanced analytic platform allowing user-developed applications to quickly ingest, analyze, and correlate information as it arrives from a wide variety of real-time sources. The Streaming Analytics service gives you the ability to deploy Streams applications to run in the IBM Cloud.

This guide will help you through the processes for building, submitting and monitoring a streaming analytics application using the Streaming Analytics service on IBM Cloud. The guide assumes that you are already familiar with Streams application development and are now ready to start developing for the cloud. If you are not already familiar with Streams application development you should first check out the out this brief introduction to IBM Streams.

This guide will help you download and setup the IBM Streams Quick Start Edition (QSE) for Docker to use as your development environment for Streams applications. In addition, this guide provides a sample application that you can build in QSE and run in the cloud.

If you have questions or difficulty following the steps in this guide, please add a comment at the bottom of the article so we can address your concerns. Also, if there are topics related to development that are not covered here, comments are welcome.

Setting Up Your Streams Development Environment

The Streaming Analytics V2 price plans in IBM Cloud are running IBM Streams version 4.3 on CentOS 7.4.

In order to compile Streams applications to deploy to an instance in the cloud, you need IBM Streams version 4.0.x, 4.1.x, 4.2.x or 4.3.x installed on a RHEL or CentOS 7.4 x86_64 system. If you have a compatible installation of IBM Streams available to you, you can use that for your development environment and skip to the “Introducing Our Sample Application” section further down in this document.

If you do not have a compatible installation of IBM Streams, you can set up the IBM Streams Quick Start Edition (QSE) and use it as your development environment for the cloud.

IBM Streams Quick Start Edition (QSE) for Docker Setup

Streams QSE is now available through dockerhub.  To run the QSE in Docker, follow the IBM Streams QSE setup instructions for the OS you are using for your development environment.

(Alternatively, if you want to run the Streams QSE on Linux native (without Docker) visit the IBM Streams Quick Start Edition download site for instructions and the set of available downloads.)

Introducing Our Sample Application

The sample application used for illustration in this guide accesses Twitter to get a live sample feed of tweets.

The sample application is actually comprised of two SPL applications. The first application, TwitterStream, accesses the live Twitter feed and exports a stream of tweets for use by other SPL applications. In the following streams application graph the first operator reads from an HTTP stream, the second operator filters out any messages that are not Twitter statuses, the third operator adds a count to each tuple. The final operator exports the stream for use by other jobs in the streams instance.

TwitterStream application graph

The second application, Smackdown, produces a score for each word in a list of words. At job submission time you specify a list of words participating in the smackdown. For each “opponent” word in the smackdown, the application calculates the number of Twitter statuses containing the search word. Every minute, the application produces a running score for the previous five minutes. In this application, the first operator imports the stream of tweets that is exported by the TwitterStream application. The second operator calculates a match score for each opponent in the smackdown. The third operator keeps a running aggregation of the number of matches for each opponent, producing a score every minute. The last operator prints the results to its standard output, also known as the process console messages which you will view later.Smackdown application graph

Structuring this sample into two separate streams jobs using export and import allows us to use a single connection to the Twitter source while being able to run multiple smackdowns using the same stream of tweets. You will import the sample source into Streams Studio in order to compile the two applications.

Downloading the Sample Application Source

Please note: The instructions assume you will be using Firefox within the IBM Quick Start Edition for Docker to download the zip file. If you download the source zip file directly to your host computer (e.g. Windows), you will need to copy it to the IBM Quick Start Edition for Docker.

Download Sample Source Files

Unzip the downloaded file which will create a directory called Smackdown.

Compiling the Sample in Streams Studio

IBM Streams applications are written in Streams Processing Language (SPL). SPL applications are compiled into a Streams Application Bundle (SAB) which can be submitted as a job using the Streaming Analytics service in IBM Cloud. You can use Streams Studio to edit and compile your applications. Streams Studio is an Eclipse-based development environment.

Following the steps in this section, you will first compile and run the sample application in IBM Cloud. Later, you will use Streams Studio to modify the application, re-compile and run the application again.

Within your running IBM Quick Start Edition for Docker, start Streams Studio from the menu: Applications->Favorites->Streams Studio (Eclipse).

  1. Click OK to accept the default workspace name, /home/streamsadmin/workspace. When studio starts up there are no projects defined yet.
  2. To import the sample click File->Import.
  3. In the import dialog expand IBM Streams Studio, select SPL Project, and click Next.
  4. For the Source use the Browse button to navigate to the unzipped Smackdown folder. Select the Smackdown project in the list and click Finish.

Studio will import the project into the workspace. By default, Studio rebuilds the workspace when files are created or modified. It might take a couple minutes for the applications to compile. You can see the status of the build in the lower right status bar of studio. When the build finishes, the project explorer should look like this.
Project Explorer after import

Expanding the Resources folder and subfolders as shown in the figure below reveals the compiled Streams Application Bundle (SAB) files. These bundles are now ready to submit to Streaming Analytics on IBM Cloud.
Compiled applications in project explorer

For future reference the full path names for the two bundles are:

  • /home/streamsadmin/workspace/Smackdown/output/sample.Smackdown/Distributed/sample.Smackdown.sab
  • /home/streamsadmin/workspace/Smackdown/output/sample.TwitterStream/Distributed/sample.TwitterStream.sab

You’ll use those paths later to submit the jobs on IBM Cloud.

Creating Twitter Application Credentials

The TwitterStream SPL application uses a Twitter streaming API to get a live sample stream of twitter updates. You need to create authorization credentials for the application to use to connect to the Twitter API.

  1. Go to apps.twitter.com.
  2. Log in to your Twitter account (or sign up for Twitter).
  3. On the Application Management page, click the Create New App button.
  4. Enter a Name and Description for your application/
  5. Enter a Website for your application. Twitter requires you to enter a valid HTTP url, for example, https://developer.ibm.com/streamsdev/docs/streaming-analytics-dev-guide/
    Twitter application details
  6. Click the Yes, I agree checkbox to accept the Twitter Developer Agreement.
  7. Click the Create your Twitter application button. After the application is created, the application details page is displayed.
  8. Switch to the Keys and Access Tokens tab.
  9. Click the Create my access token button.
    Twitter application access tokens
  10. Copy and paste the following values for later.
    • Consumer Key (API Key)
    • Consumer Secret (API Secret)
    • Access Token
    • Access Token Secret

These four values will be used as submission-time parameters when submitting the TwitterStream job to Streaming Analytics on IBM Cloud.

Creating a Streaming Analytics Service Instance on IBM Cloud

If you have already created an instance of the Streaming Analytics service using one of the V2 plans, you can skip this section. If you still need to create one, follow the steps below:

  1. Go to www.bluemix.net.
  2. Log in to your IBM Cloud account (or create an account).
  3. Open the CATALOG link.
  4. Browse for and select the Streaming Analytics service.
  5. The Streaming Analytics Catalog page will be displayed
  6. Enter a Service name, or use the default name provided.
  7. Select one of the V2 service plans
    Price plan section of the Streaming Analytics catalog displaying a partial list of the V2 price plans.
  8. Click Create to create an instance of the service. This provides you with your own Streams instance, started and ready to run Streams applications.
  9. The Streaming Analytics service dashboard will be displayed.

You can use the START and STOP buttons on the service dashboard to start and stop the service. While started, you can submit jobs to the service using the Streams Console.

Submitting Your Jobs to Streaming Analytics on IBM Cloud

On the Streaming Analytics service dashboard use the LAUNCH button to start the Streams Console, which will open in a new window or tab. The console will open displaying an Application Dashboard which allows you to submit, monitor and cancel your jobs in the Streaming Analytics service.

Recall that the compiled application bundles were at these paths in the IBM Quick Start Edition for Docker:

/home/streamsadmin/workspace/Smackdown/output/sample.Smackdown/Distributed/sample.Smackdown.sab
/home/streamsadmin/workspace/Smackdown/output/sample.TwitterStream/Distributed/sample.TwitterStream.sab

You will be submitting jobs using these bundles. The instructions here assume you are using Firefox within the QSE to submit the jobs. If you are using a browser on your host computer (e.g. Windows), you will need to transfer these bundle files from the QSE to your host computer before you continue.

  1.  Click the Submit Job button in the console tool bar.
    Submit Job icon
  2. In the Submit Job dialog the Instance is pre-selected for you.  For the Application bundle file use the Browse button to navigate to and select the sample.TwitterStream.sab file.
  3. Click the Configure button.
  4. The bundle file will be be uploaded and you will be prompted for additional options as shown in the image below.
  5. Take the default values for the options near the top of the dialog.
  6. For the Submission-time parameters, enter the four values for Twitter application credentials that you created earlier. The asterisk preceding the name of a submission-time parameter indicates there is no default value, so you are required to enter a value to continue.
  7. Click the Submit button.

The job is submitted to the Streaming Analytics instance. You should see a couple of pop-up messages appear briefly showing the job submission status. The Streams Console refreshes automatically in the background. After a short period of time, the job will appear in the Summary, Streams Tree, and Streams Graph views along the top of the Application Dashboard. These views will stay updated to show the current status of the jobs running in your streams instance.

At this point there is just one job running, TwitterStream, as shown in the Streams Graph view. The graph shows the operators and connections between them. When a new job is starting up some operators might be decorated with yellow triangles and the connections might be dashed lines, indicating that the job is not yet completely started and healthy. When the job is fully up and running the operators are decorated with green circles and the connection lines are solid, indicating that the job is fully healthy.

Tip: If you want to focus on a particular view, hover over the card title and a tool bar will appear. You can click on the Max icon to maximize that card within the dashboard. When maximized click the icon again to restore back the the tiled layout.Maximize card icon in Streams Console

Monitoring Your Job

Monitoring Tuple Flows

The number appearing in the connection line is the most recent number of tuples per second flowing between the two operators. If there are no numbers on the connections there have been no tuples flowing recently between the operators. If you hover the mouse over an object in the Streams Graph view, a pop-up will show details for that object and, in some cases, a menu of actions for that object.

Monitor tuples flowing bewtween operators

In this case there were 62 tuples per second the last time metrics were refreshed. Twitter’s sample stream is a small subset of all Twitter statuses. The flow rate can vary but usually seems to be several dozen per second. The metrics shown at the bottom of the pop-up are the cumulative amounts.

Submitting the Smackdown Job

At this point you just have the one job producing a stream of tuples. When the tuples get to the TweetsExport operator they are discarded because there is no job connected to that stream. (If for some reason your job does not have any data flowing, the most likely problem is correctly specifying the Twitter authorization credentials in the submission-time parameters. The section on viewing trace messages below will explain how to investigate.)

It’s time to submit the second job, Smackdown, to consume the stream of tuples produced by the first job. Follow the job submission steps above again to submit the sample.Smackdown.sab file. When prompted for the opponents submission-time value enter red,green,blue for three words competing in the smackdown. You can use three celebrities, band names, team names, etc.

 

In a few moments the second job will be started and the Streams Graph view will refresh to show the TweetsExport operator in the TwitterStream job being connected to the TweetsImport operator in the Smackdown application.

Streams graph showing two jobs

Viewing Sample Data in a Dynamic View

You can create a view on the tuples flowing across any of the connections in the streams graph to monitor a sample of the data that is flowing between two operators.

  1. Hover over the connection between the MatchAggregate and AggConsole operators.
  2. Click Create Dashboard View.
    Create view in Streams Console
  3. Switch to the Buffer tab.
  4. Change Tuples/sec Throttle to 3. The MatchAggregate operator produces one tuple for each smackdown opponent every sixty seconds. Changing this throttle from 1 to 3 will include the most recent score for each entry in our word list.
  5. Click OK to create the new view.
    Buffer settings when creating view

A new data visualization view will be added to the Application Dashboard. This view definition will remain after you log out and log back in to the console.

Data visualization view

The image above indicates At this point, “blue” is winning the smackdown. The matches column shows the number of tweets in the last five minutes that contain each of the smackdown opponent words.

Displaying a Line Chart of the Data

You can also create a line chart from a data visualization view.

  1. In the the data visualization view, click the Create Time Series Chart icon.
    Create time series chart in Streams Console
  2. On the Chart tab, increase the Number of snapshots to 30, so the chart shows an interesting number of points.
  3. Switch to the Categories tab.
  4. For Choose line categories from, select Multiple attributes values.
  5. For Lines measured against this attribute, select the matches attribute.
  6. For Plot lines for each unique value of, select the smackdownWords attribute.
  7. Click OK to create the chart.
    Category settings when creating time series chart

A new line chart view will be added to the Application Dashboard. This view definition will remain after you log out and log back in to the console.

If you hover over one of the lines in the chart, a status bar will display with the maximum, minimum and median values for that series.

Time series chart displayed as bar graph

Canceling Your Job

In the next section you will enhance the Smackdown application and resubmit the job. But first cancel the existing Smackdown job.

Cancel Job icon

  1. Click the Cancel Jobs button in the console tool bar.
  2. In the Cancel Jobs dialog select the Smackdown application.
    Cancel Job dialog
  3. Click the Cancel Jobs button and then click Yes when prompted to confirm the job cancellation.
Tip: If you have any open data visualization or chart views for a job, those views remain when the job is canceled. You probably want to close those views because no new data will be produced in those views after the job is canceled.

Enhancing the Sample Application

Adding Additional Function

So far the sample application produces a simple aggregate of the number of tweets that contain our search words. Let’s modify the Smackdown application to calculate the percentage of the tweets containing the search words. To do this you need to:

  • Add a new field to the Aggregate operator output schema.
  • Set the new field in the operator’s output assignments.
  • Add a downstream operator to calculate the percentage of tuples that match the search word

You will use the The SPL Editor, a text-based editor that provides language syntax highlighting and context-sensitive assistance.

Returning to studio in the QSE, in the Project Explorer view:

  1. Drill down to the sample::Smackdown main composite operator.
  2. Right click on the operator and select Open With, and then SPL Editor.
    Open with SPL Editor in Streams Studio
  3. Scroll down to the Aggregate operator.
  4. Add the new field to the output stream schema: int32 tuples
  5. Add the new output assignment: tuples = Count()
    Count() is an output assignment function provided by the Aggregate operator which returns the number of tuples currently in the window.
    Modify Aggregate operator invocation in SPL source

Next, add the following two operator invocations at the end of the composite operator, right before the config clause.

        /** Calculate percent matched */
        stream<rstring smackdownWords, int32 matches, int32 tuples, float64 percent> Results = Functor(MatchAggregate)
        {
            output
                Results : percent = roundedPercent(matches, tuples);
        }

        /** Print results for viewing in console log */
        () as ResultsConsole = Custom(Results)
        {
            logic
                onTuple Results :
                {
                    printStringLn((rstring) Results) ;
                }
        }

In the output clause of the above Functor invocation is a call to the function roundedPercent. You need to define that function to calculate the percent. It will round the result to four decimal places. You can define reusable functions in SPL very simply. Add the following function definition to the bottom of the Smackdown.spl source file, after the closing brace so it is outside the scope of the main composite operator.

/** Calculate percentage and round to four decimal places */
float64 roundedPercent(int32 x, int32 y) {
    return y > 0 ? round(((float64)x * 100.0 / (float64)y ) * 10000.0) / 10000.0 : 0.0;
}

Save the changes using File -> Save or Ctrl+S. Streams Studio will automatically rebuild the application based on the changes to the source file.

With these changes made and compiled, submit the Smackdown application again following the same steps used to submit the job before.

Viewing the New Results

With the new Smackdown job running you can see the two new operators, Results and ResultsConsole, in the Streams Graph view. Now open a data visualization view on the results with the calculated percentage.

Combines streams graph with new operators

  1. Hover over the connection between the Results and ResultsConsole operators.
  2. Click Create Dashboard View.
  3. Switch to the Attributes tab. This tab allows you to select which tuple attributes to be included in the view.
  4. Deselect the matches and tuples attributes so only the smackdownWords and percent are selected.
    Modifying attrubutes when creating data visualization view
  5. Switch to the Buffer tab.
  6. Change Tuples/sec Throttle to 3.
  7. Click OK to create the new view.

The new data visualization view displayed will show you the percent of tweets containing each of the smackdown words.

Data visualization view of new results in Streams Console

If you want you can now create a line graph chart with these results following the previous steps for creating a chart.

Troubleshooting

Viewing Trace Messages

For troubleshooting you can see if the operators in your application are logging any messages indicating failures. For example, I submitted the TwitterStream job and I can see that no tuples are flowing through the graph. The most likely explanation is that a source adapter operator is not connecting to its source.

  1. Navigate to the Log Viewer in the console.
  2. Expand the tree view for the TwitterStream job to find the PE that contains the TwitterSource operator.
  3. Switch to the Application Trace tab in the log viewer.
  4. Click Load application traces.
    Load application trace messages in Streams Console

A current snapshot of the trace messages for the operator is loaded with the newest messages shown first. You can see that the operator is receiving an HTTP status code of 401 which means “Not authorized”.

Application trace messages displayed in Streams Console

By default only error messages are included in the trace logs. If the error messages don’t seem to have enough detail you can investigate further by adjusting the level of detail being logged. There is a trace level setting for every PE in the job, so you can change the trace level for individual PEs as needed.

  1. Return to the tree view in the Log Viewer
  2. Hover over the “i” information icon for the operator (or PE)
  3. Click the Set Application Trace Level action
    Set Application Trace action in Streams Console
  4. In the Set Application Trace Level dialog change the Trace Output Level to Information.
    Set Application Trace Level dialog in Streams Console

The log view does not automatically refresh its contents like the Application Dashboard. Wait a couple minutes for the operator to retry connecting to the server and click the Reload link above the log messages.

Detailed application trace messages displayed in Streams Console

In addition to the HTTP 401 status code, the operator logged a warning message indicating the error was due to an authentication error. In this case, canceling the job and resubmitting it making sure to correctly copy and paste the Twitter credentials fixed the problem.

Tip: If you want to change the trace level for all PEs in the job you can use the Set Application Trace action on the menu for the job.
Tip: If you need to see detailed trace messages when the PEs are initially starting, you can set the trace level on the job submission dialog.

Viewing Console Log Messages

In addition to trace messages, operators can also write messages to the console log, which is the standard output for the process. Unlike tracing, console logging has no levels of messages and the output is completely free-form. The Smackdown application uses a Custom operator to write result messages to the console.

  1. In the Streams Console navigate to the Log Viewer.
  2. Expand the tree view for the Smackdown job to find the PE that contains the AggConsole operator.
  3. Switch to the Console tab in the log viewer.
  4. Click Load console messages

A current snapshot of the trace messages for the operator is loaded with the newest messages shown first. The console will contain the result messages that are printed every minute, one for each search term.

Console messages displayed in Streams Console

 

Downloading All Logs For A Job

You can capture a snapshot of all logs for a job to download to your computer. This can be handy for searching across the logs for a large number operators in a job.

  1. Navigate to the Streams Tree view in the Application Dashboard.
  2. Expand the tree view to the job for which you want to download logs.
  3. Hover over the “i” information icon for the job
  4. Click the Download Job Logs action
    Download job logs action in Streams Console

A pop-up window should appear showing that a request was made to collect the logs for the job. When the request is complete a gzipped tar file will be download through the browser.

Additional Resources

 

 

3 comments on"Streaming Analytics Development Guide for IBM Cloud"

  1. Running ./streamsdockerInstall.sh on Mac aborted with the following message:

    TASK [install Streams prereq packages with yum] ********************************
    changed: [streamsqse.localdomain] => (item=readline)
    failed: [streamsqse.localdomain] (item=requests) => {“changed”: false, “cmd”: “/usr/bin/pip2 install -U requests”, “item”: “requests”, “msg”: “stdout: Collecting requests\n Downloading https://files.pythonhosted.org/packages/49/df/50aa1999ab9bde74656c2919d9c0c085fd2b3775fd3eca826012bef76d8c/requests-2.18.4-py2.py3-none-any.whl (88kB)\nCollecting certifi>=2017.4.17 (from requests)\n Downloading https://files.pythonhosted.org/packages/7c/e6/92ad559b7192d846975fc916b65f667c7b8c3a32bea7372340bfe9a15fa5/certifi-2018.4.16-py2.py3-none-any.whl (150kB)\nCollecting chardet<3.1.0,>=3.0.2 (from requests)\n Downloading https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl (133kB)\nCollecting idna<2.7,>=2.5 (from requests)\n Downloading https://files.pythonhosted.org/packages/27/cc/6dd9a3869f15c2edfab863b992838277279ce92663d334df9ecf5106f5c6/idna-2.6-py2.py3-none-any.whl (56kB)\nCollecting urllib3<1.23,>=1.21.1 (from requests)\n Downloading https://files.pythonhosted.org/packages/63/cb/6965947c13a94236f6d4b8223e21beb4d576dc72e8130bd7880f600839b8/urllib3-1.22-py2.py3-none-any.whl (132kB)\nInstalling collected packages: certifi, chardet, idna, urllib3, requests\n Found existing installation: chardet 2.2.1\n Uninstalling chardet-2.2.1:\n Successfully uninstalled chardet-2.2.1\n Found existing installation: idna 2.4\n Uninstalling idna-2.4:\n Successfully uninstalled idna-2.4\n Found existing installation: urllib3 1.10.2\n Uninstalling urllib3-1.10.2:\n Successfully uninstalled urllib3-1.10.2\n Found existing installation: requests 2.6.0\n\n:stderr: ipapython 4.5.0 requires pyldap>=2.4.15, which is not installed.\nrtslib-fb 2.1.63 has requirement pyudev>=0.16.1, but you’ll have pyudev 0.15 which is incompatible.\nipapython 4.5.0 has requirement dnspython>=1.15, but you’ll have dnspython 1.12.0 which is incompatible.\nCannot uninstall ‘requests’. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.\n”}
    changed: [streamsqse.localdomain] => (item=future)
    changed: [streamsqse.localdomain] => (item=dill)
    to retry, use: –limit @/root/ansible/streamsdockerCentos7.retry

    PLAY RECAP *********************************************************************
    streamsqse.localdomain : ok=40 changed=31 unreachable=0 failed=1

  2. I believe this is due to a change in Python repositories.

    to fix this:
    1 – go to the QSE install directory and locate the Ansible/streamsdockerCentos7.yaml file
    2 – open the streamsdockerCentos7.yaml file in your favorite editor
    3 – locate the following section:

    ######### Python Module installation ####################################

    name: install Streams prereq packages with yum
    pip: name={{item}} state=latest
    with_items:

    readline
    requests
    future
    dill

    4 – change ‘requests’ to ‘request’
    5 – save the file
    6 – remove the old container
    docker stop streamsdocker4240
    docker rm streamsdocker4240
    7 – re-run the installation.

  3. Thank you Dave. That fixed the issue:

    PLAY RECAP *********************************************************************
    streamsqse.localdomain : ok=63 changed=50 unreachable=0 failed=0

Join The Discussion