Streaming Analytics is a Bluemix service built upon the IBM Streams technology.  Streams is an advanced analytic platform allowing user-developed applications to quickly ingest, analyze, and correlate information as it arrives from a wide variety of real-time sources.  The Streaming Analytics service gives you the ability to deploy Streams applications to run in the Bluemix cloud.

Introduction

This guide will help you through the processes for building, submitting and monitoring a streaming analytics application using the Streaming Analytics service on IBM Bluemix.

This guide assumes that you are already familiar with Streams application development and are now ready to start developing for the cloud. If you are not already familiar with Streams application development you should first check out the out the Streams Quick Start Guide as well as the Roadmap for Streaming Analytics Service on Bluemix.

In this guide you will learn how to download and setup the IBM Streams Quick Start Edition VM to use as a development environment. A sample application is provided to download, build and run in the cloud.

If you have questions or difficulty following the steps in this guide, please add a comment at the bottom of the article so we can address your concerns. Also, if there are topics related to development that are not covered here, comments are welcome.

Setting Up IBM Streams Quick Start Edition VM

The Streaming Analytics service in Bluemix is running IBM InfoSphere Streams version 4.2.0.1 or later on CentOS 6.7 servers.

In order to compile Streams applications for deploying on the cloud, you need IBM Streams version 4.0.x, 4.1.x  or 4.2.x installed on a RHEL or CentOS 6.7 x86_64 system. If you have a compatible installation of IBM Streams available to you, you can use that for your development environment and skip to “Introducing Our Sample Application.”

The Streams Quick Start Edition VM image can help you get started with Streams without having to install and manage a Linux cluster environment. You can run this virtual machine image using the Oracle VM Virtualbox or VMWare player on a 64-bit host OS such as Windows, Mac OS X or Linux. On some host systems you might need to configure the BIOS to support 64-bit virtual machines.

System Requirements

Components Minimum Requirements Comments
Operating System 64-bit operating system that supports VMware VMware is supported on the following operating systems:
– Apple Mac OS X
– Linux
– Microsoft Windows
Memory 8 GB The amount of memory that is required by IBM Streams is dependent on the applications that are developed and deployed. This minimum requirement is based on the memory requirements of the Commodity Purchasing sample application and other samples that are provided with the product.
Disk space 20 GB

Downloading the Quick Start Edition VM

Follow these instructions to download the Quick Start Edition VM using Oracle VM Virtualbox on Windows. These instructions are for using Oracle VM Virtualbox because it’s freely available for commercial use. VMWare provides a VMWare player that is available for non-commercial use.

  1. Go to IBM Stream Computing.
  2. Scroll down to IBM Streams Quick Start
  3. Click here to download the VMware Image
    (You can also go to the IBM Stream Computing page, scroll down to IBM Streams Quick Start and select the link to Download the VMWare image)
  4. Sign in with your IBM ID (or click Get an IBM ID to sign up for a new IBM ID).
  5. Answer the demographic questions, accept the License agreement and click I confirm.
  6. Select IBM Streams 4.2 Quick Start VMware image.
  7. Click Download now. This download is approximately 4GB so depending on your internet connection speed, this may take several minutes to download.
  8. When the download completes, extract the ZIP file to a directory.

Downloading Oracle VM Virtualbox

  1. Download Oracle VM Virtualbox.
  2. Install Virtualbox.

Setting Up Quick Start Edition VM Using Oracle VM Virtualbox

  1. Start Oracle VM Virtualbox.
  2. New virtual machineClick the New button to create a new virtual machine.
  3. Click the Expert Mode button at the bottom of the dialog.

    1. Enter a name for the virtual machine
    2. Select Linux for Type
    3. Select Red Hat (64-bit) for Version
      Tip: If there are no 64-bit versions listed, your host OS (Windows, Mac OS X or Linux) does not supporting 64-bit virtual machines. You might need to modify your system BIOS to enable 64-bit virtual machines. The resolution will vary depending on your hardware.
    4. Enter 4096 (4GB) for Memory size
    5. For Hard disk, click Use an existing virtual hard disk file.
    6. Use the folder browse button to the right to navigate to the directory where you unzipped the Quick Start Edition VM Image zip file. Select the Virtual Disk.vmdk file.
  4. Click the Create button.
  5. Virtual machine settings iconClick Settings to adjust some of the settings for the VM.
  6. Select the General tab on the left and then the Advanced tab. Set Shared Clipboard to Bidirectional. This will allow you to copy and paste text between your host OS and the VM.
  7. Select the System tab on the left and then the Processor tab. Set Processors(s) to 2.
  8. Select the Display tab on the left. Change Video Memory to 128 MB.
  9. Select the Network tab on the left. Adapter 1 is already defined attached to NAT to give your VM access to the internet.
  10. Select Adapter 2, click Enable Network Adapter and for Attached to then select Host-only Adapter. This provides your VM with an IP address on adapter eth1 so you can connect to your VM from your host OS. This is helpful if you need to transfer files between your host OS and the Quick Start Edition VM.
  11. Click OK to apply the settings.
  12. Virtual machine start iconClick the Start button to start the VM. VirtualBox will start the new virtual machine and begin booting the image. After a few moments you will be prompted several times to accept licenses agreements from Red Hat, VMWare and IBM. Press Enter to accept each license.
  13. The initial setup will configure InfoSphere Streams with a streams domain and one streams instance. When prompted for a user and password to create the streams instance, enter streamsadmin for the user and passw0rd for the password.
  14. When initial setup completes the Linux desktop is displayed.
    Virtual machine desktop

The IBM Streams Quick Start Edition VM setup is now complete. Note that there is a running IBM Streams domain and instance within the virtual machine. You can submit streams jobs to this environment just like in an “on premise” installation of IBM Streams. However, this development guide only describes submitting jobs to Streaming Analytics on Bluemix.

When you want to shutdown the VM use the System -> Shut Down and when prompted select Shut Down. When you want to start up the VM later, open VirtualBox, select your VM and click the Virtual machine start icon Start button.

Transferring Files Between the Quick Start VM and Your Host OS

If you need to transfer files between your Quick Start VM and your host OS (e.g. Windows), you can use WinScp or another file transfer facility such as scp.

  1. In your Quick Start VM, open a terminal session and run ifconfig eth1 to display IP address of the VM.
    Virtual machine network configuration
  2. On your Windows OS download and install WinSCP.
  3. Start WinSCP
  4. For Host name enter the IP address of your VM.
  5. For User name and Password enter streamsadmin and passw0rd, and click Login
    Connect to virtual machine using WinSCP
  6. Navigate to the source and target folders on the host OS and VM.
  7. Drag and drop files to copy them between systems.
    Drag and drop files using WinSCP

Introducing Our Sample Application

The sample application used for illustration in this guide accesses Twitter to get a live sample feed of tweets.

The sample application is actually comprised of two SPL applications. The first application, TwitterStream, accesses the live Twitter feed and exports a stream of tweets for use by other SPL applications. In the following streams application graph the first operator reads from an HTTP stream, the second operator filters out any messages that are not Twitter statuses, the third operator adds a count to each tuple. The final operator exports the stream for use by other jobs in the streams instance.

TwitterStream application graph

The second application, Smackdown, produces a score for each word in a list of words. At job submission time you specify a list of words participating in the smackdown. For each “opponent” word in the smackdown, the application calculates the number of Twitter statuses containing the search word. Every minute, the application produces a running score for the previous five minutes. In this application, the first operator imports the stream of tweets that is exported by the TwitterStream application. The second operator calculates a match score for each opponent in the smackdown. The third operator keeps a running aggregation of the number of matches for each opponent, producing a score every minute. The last operator prints the results to its standard output, also known as the process console messages which you will view later.Smackdown application graph

Structuring this sample into two separate streams jobs using export and import allows us to use a single connection to the Twitter source while being able to run multiple smackdowns using the same stream of tweets. You will import the sample source into Streams Studio in order to compile the two applications.

Downloading the Sample Application Source

The instructions here assume you will be using Firefox within the Quick Start Edition VM to download the zip file.

If you download the source zip file on your host computer (e.g. Windows), you will need to copy it to the Quick Start Edition VM using the file copying instructions above.

Download Sample Source Files

Unzip the downloaded file which will create a directory called Smackdown.

Compiling the Sample in Streams Studio

IBM Streams applications are written in Streams Processing Language (SPL). SPL applications are compiled into a Streams Application Bundle (SAB) which can be submitted as a job using the Streaming Analytics service in Bluemix. You can use Streams Studio to edit and compile your applications. Streams Studio is an Eclipse-based development environment.

First you will compile and run the sample application in Bluemix. Later you will use Streams Studio to modify the application, re-compile and run the application again.

With your Quick Start Edition VM running, start Streams Studio by using the Streams Studio (Eclipse) icon on the Linux desktop. Streams Studio icon

  1. Click OK to accept the default workspace name, /home/streamsadmin/workspace. When studio starts up there are no projects defined yet.
  2. To import the sample click File -> Import.
  3. In the import dialog expand InfoSphere Streams Studio, select SPL Project, and click Next.
  4. For the Source use the Browse button to navigate to the unzipped Smackdown folder. Select the Smackdown project in the list and click Finish.

Studio will import the project into the workspace. By default, Studio rebuilds the workspace when files are created or modified. It might take a couple minutes for the applications to compile. You can see the status of the build in the lower right status bar of studio. When the build finishes, the project explorer should look like this.
Project Explorer after import

Expanding the Resources folder shows the compiled Streams Application Bundle (SAB) files. These bundles are now ready to submit to Streaming Analytics On Bluemix.
Compiled applications in project explorer

For future reference the full path names for the two bundles are:

/home/streamsadmin/workspace/Smackdown/output/sample.Smackdown/Distributed/sample.Smackdown.sab
/home/streamsadmin/workspace/Smackdown/output/sample.TwitterStream/Distributed/sample.TwitterStream.sab

You’ll use those paths later to submit the jobs on Bluemix.

Creating Twitter Application Credentials

The TwitterStream SPL application uses a Twitter streaming API to get a live sample stream of twitter updates. You need to create authorization credentials for the application to use to connect to the Twitter API.

  1. Go to apps.twitter.com.
  2. Log in to your Twitter account (or sign up for Twitter).
  3. On the Application Management page, click the Create New App button.
  4. Enter a Name and Description for your application/
  5. Enter a Website for your application. Twitter requires you to enter a valid HTTP url, for example, https://developer.ibm.com/streamsdev/docs/bluemix-streaming-analytics-development-guide/
    Twitter application details
  6. Click the Yes, I agree checkbox to accept the Twitter Developer Agreement.
  7. Click the Create your Twitter application button. After the application is created, the application details page is displayed.
  8. Switch to the Keys and Access Tokens tab.
  9. Click the Create my access token button.
    Twitter application access tokens
  10. Copy and paste the following values for later.
    • Consumer Key (API Key)
    • Consumer Secret (API Secret)
    • Access Token
    • Access Token Secret

These four values will be used as submission-time parameters when submitting the TwitterStream job to Streaming Analytics on Bluemix.

Creating a Streaming Analytics Service on Bluemix

The Streaming Analytics service gives you the ability to deploy Streams applications to run in the Bluemix cloud.

  1. Got to www.bluemix.net.
  2. Log in to your Bluemix account (or sign up for Bluemix).
  3. Open the CATALOG link.
  4. Browse for and select the Streaming Analytics Service icon
    Streaming Analytics icon
  5. The Streaming Analytics Catalog page will be displayed
    Streaming Analytics service in Bluemix
  6. Enter a Service name, or use the default name provided.
  7. Click CREATE to create an instance of the service for you. This provides you with your own Streams instance, started and ready to run Streams Applications.
  8. The Streaming Analytics service dashboard will be displayed.
    Streaming Analytics service dashboard

You can use the START and STOP buttons on the service dashboard to start and stop the service. While started, you can submit jobs to the service using the Streams Console.

Submitting Your Jobs to Streaming Analytics on Bluemix

On the Bluemix service dashboard use the LAUNCH button to start the Streams Console, which will open in a new window or tab. The console will open displaying an Application Dashboard which allows you to submit, monitor and cancel your jobs in the Streaming Analytics service.Application dashboard in Streams Console

Recall that the compiled application bundles were at these paths in the Quick Start Edition VM:

/home/streamsadmin/workspace/Smackdown/output/sample.Smackdown/Distributed/sample.Smackdown.sab
/home/streamsadmin/workspace/Smackdown/output/sample.TwitterStream/Distributed/sample.TwitterStream.sab

You will be submitting jobs using these bundles. The instructions here assume you are using Firefox within the Quick Start Edition VM to submit the jobs. If you are using a browser on your host computer (e.g. Windows), you will need to transfer these bundle files from the VM to your host computer before you continue.

  1. Submit Job icon Click the Submit Job button in the console tool bar.
    In the Submit Job dialog the instance name is pre-selected for you.
  2. For the Application bundle file use the Browse button to navigate to and select the sample.TwitterStream.sab file.
  3. Click the Next button. Submit job dialog
  4. The bundle file will be be uploaded and you will be prompted for additional options.
  5. Take the defaults for the top options.
  6. For the Submission-time values, enter the Twitter application credentials that you created earlier. You’ll need to enter the first two values and then scroll the list to reveal the next two values. The asterisk preceding the name of a submission-time values indicates there is no default value, so you are required to enter a value to continue.
  7. Click the Submit button.

Submission time parameters in Submit Job dialogThe job is submitted to the Streaming Analytics instance. You should see a couple of pop-up messages appear briefly showing the job submission status. The Streams Console refreshes automatically in the background. In a few moments the job will appear in the Summary, Streams Tree, and Streams Graph views along the top of the Application Dashboard. These views will stay updated to show the current status of the jobs running in your streams instance.

Submitted job summary in Streams Console

At this point there is just one job running, TwitterStream, as shown in this Streams Graph view. The graph shows the operators and connections between them. When a new job is starting up some operators might be decorated with yellow triangles and the connections might be dashed lines, indicating that the job is not yet completely started and healthy. When the job is fully up and running the operators are decorated with green circles and the connection lines are solid, indicating that the job is fully healthy.

Tip: If you want to focus on a particular view, hover over the card title and a tool bar will appear. You can click on the Max icon to maximize that card within the dashboard. When maximized click the icon again to restore back the the tiled layout.Maximize card icon in Streams Console

Monitoring Your Job

Monitoring Tuple Flows

The number appearing in the connection line is the most recent number of tuples per second flowing between the two operators. If there are no numbers on the connections there have been no tuples flowing recently between the operators. If you hover the mouse over an object in the Streams Graph view, a pop-up will show details for that object and, in some cases, a menu of actions for that object.

Monitor tuples flowing bewtween operators

In this case there were 54 tuples per second the last time metrics were refreshed. Twitter’s sample stream is a small subset of all Twitter statuses. The flow rate can vary but usually seems to be several dozen per second. The metrics shown at the bottom of the pop-up are the cumulative amounts.

Submitting the Smackdown Job

At this point you just have the one job producing a stream of tuples. When the tuples get to the TweetsExport operator they are discarded because there is no job connected to that stream. (If for some reason your job does not have any data flowing, the most likely problem is correctly specifying the Twitter authorization credentials in the submission-time parameters. The section on viewing trace messages below will explain how to investigate.)

It’s time to submit the second job, Smackdown, to consume the stream of tuples produced by the first job. Follow the job submission steps above again to submit the sample.Smackdown.sab file. When prompted for the opponents submission-time value enter red,green,blue for three words competing in the smackdown. You can use three celebrities, band names, team names, etc.

Submission time paramters in Submit Job dialog

In a few moments the second job will be started and the Streams Graph view will refresh to show the TweetsExport operator in the TwitterStream job being connected to the TweestsImport operator in the Smackdown application.

Streams graph showing two jobs

Viewing Sample Data in a Dynamic View

You can create a view on the tuples flowing across any of the connections in the streams graph to monitor a sample of the data that is flowing between two operators.

  1. Hover over the connection between the MatchAggregate and AggConsole operators.
  2. Click Create View.
    Create view in Streams Console
  3. Switch to the Buffer tab.
  4. Change Tuples/sec Throttle to 3. The MatchAggregate operator produces one tuple for each smackdown opponent every sixty seconds. Changing this throttle from 1 to 3 will include the most recent score for each entry in our word list.
  5. Click OK to create the new view.
    Buffer settings when creating view

A new data visualization view will be added to the Application Dashboard. This view definition will remain after you log out and log back in to the console.

Data visualization view

At this point, “blue” is winning the smackdown. The matches column shows the number of tweets in the last five minutes that contain each of the smackdown opponent words.

Displaying a Line Chart of the Data

You can also create a line chart from a data visualization view.

  1. In the the data visualization view, click the Create Time Series Chart icon.
    Create time series chart in Streams Console
  2. Switch to the Categories tab.
  3. For Choose line categories from, select Multiple attributes values.
  4. For Lines measured against this attribute, select the matches attribute.
  5. For Plot lines for each unique value of, select the smackdownWords attribute.
  6. Click OK to create the chart.
    Category settings when creating time series chart

A new line chart view will be added to the Application Dashboard. This view definition will remain after you log out and log back in to the console.

If you hover over one of the lines in the chart, a status bar will display with the maximum, minimum and median values for that series.

Time series chart displayed as bar graph

Canceling Your Job

In the next section you will enhance the Smackdown application and resubmit the job. But first cancel the existing Smackdown job.

Cancel Job icon

  1. Click the Cancel Jobs button in the console tool bar.
  2. In the Cancel Jobs dialog select the Smackdown application.
    Cancel Job dialog
  3. Click the Cancel Jobs button and then click Yes when prompted to confirm the job cancellation.
Tip: If you have any open data visualization or chart views for a job, those views remain when the job is canceled. You probably want to close those views because no new data will be produced in those views after the job is canceled.

Enhancing the Sample Application

Adding Additional Function

So far the sample application produces a simple aggregate of the number of tweets that contain our search words. Let’s modify the Smackdown application to calculate the percentage of the tweets containing the search words. To do this you need to:

  • Add a new field to the Aggregate operator output schema.
  • Set the new field in the operator’s output assignments.
  • Add a downstream operator to calculate the percentage of tuples that match the search word

You will use the The SPL Editor, a text-based editor that provides language syntax highlighting and context-sensitive assistance.

Returning to studio in the Quick Start Edition VM, in the Project Explorer view:

  1. Drill down to the sample::Smackdown main composite operator.
  2. Right click on the operator and select Open With, and then SPL Editor.
    Open with SPL Editor in Streams Studio
  3. Scroll down to the Aggregate operator.
  4. Add the new field to the output stream schema: int32 tuples
  5. Add the new output assignment: tuples = Count()
    Count() is an output assignment function provided by the Aggregate operator which returns the number of tuples currently in the window.
    Modify Aggregate operator invocation in SPL source

Next, add the following two operator invocations at the end of the composite operator, right before the config clause.

        /** Calculate percent matched */
        stream<rstring smackdownWords, int32 matches, int32 tuples, float64 percent> Results = Functor(MatchAggregate)
        {
            output
                Results : percent = roundedPercent(matches, tuples);
        }

        /** Print results for viewing in console log */
        () as ResultsConsole = Custom(Results)
        {
            logic
                onTuple Results :
                {
                    printStringLn((rstring) Results) ;
                }
        }

In the output clause of the above Functor invocation is a call to the function roundedPercent. You need to define that function to calculate the percent. It will round the result to four decimal places. You can define reusable functions in SPL very simply. Add the following function definition to the bottom of the Smackdown.spl source file, after the closing brace so it is outside the scope of the main composite operator.

/** Calculate percentage and round to four decimal places */
float64 roundedPercent(int32 x, int32 y) {
    return y > 0 ? round(((float64)x * 100.0 / (float64)y ) * 10000.0) / 10000.0 : 0.0;
}

Save the changes using File -> Save or Ctrl+S. Streams Studio will automatically rebuild the application based on the changes to the source file.

With these changes made and compiled, submit the Smackdown application again following the same steps used to submit the job before.

Viewing the New Results

With the new Smackdown job running you can see the two new operators, Results and ResultsConsole, in the Streams Graph view. Now open a data visualization view on the results with the caluclated percentage.

Combines streams graph with new operators

  1. Hover over the connection between the Results and ResultsConsole operators.
  2. Click Create View.
  3. Switch to the Attributes tab. This tab allows you to select which tuple attributes to be included in the view.
  4. Deselect the matches and tuples attributes so only the smackdownWords and percent are selected.
    Modifying attrubutes when creating data visualization view
  5. Switch to the Buffer tab.
  6. Change Tuples/sec Throttle to 3.
  7. Click OK to create the new view.

The new data visualization view displayed will show you the percent of tweets containing each of the smackdown words.

Data visualization view of new results in Streams Console

If you want you can now create a line graph chart with these results following the previous steps for creating a chart.

Combining Multiple Operators into a Single Process

By default the streams compiler places each operator into its own processing element (PE). Each PE becomes a single process on one of the application nodes (servers) in a running streams instance.

Sometimes it is beneficial to combine multiple operators into a single PE process. This is known as “fusing” operators. Fusing operators can improve both latency and throughput because tuples flow between fused operators using memory references instead of being transported between processes and servers. The downside of fusing is it reduces flexibility in distributing work across servers and can concentrate too much processing in a single process.

Exploring the Streams Tree view in the console, our two jobs contain three PEs each. In both cases it probably makes sense to fuse them each of the jobs into a single PE. The SPL language has a config placement clause to do this.

Exploring the Streams Tree view in Streams Console

You might have noticed that although each job has three PEs, each job has four operators. This is because the Export and Import operators are automatically fused with the operator whose stream they are exporting or importing.

To fuse multiple operators together you can add the following clause at the end of each operator invocation you want to be fused together:

config
     placement : partitionColocation("partition1") ;

In the TwitterStream.spl source in studio, this change for the HTTPGetStream operator would be:

Adding placement configuration to operator in SPL source

Add the same clause to the Filter and Functor operator invocations as well and all operators in the job will be fused into one PE.

Similarly, modify the Smackdown.spl source to add this clause to the two Custom operator invocations and the Aggregate operator invocation to produce a single PE.

When these applications are run, the Streams Graph will look the same but there will be only two PEs, one for each job.

For additional details about configuring operator placement see Operator placement config clause.

Controlling Placement of PEs on Hosts

In addition to combining multiple operators into a single process, SPL provides configuration support for controlling which hosts your processes (PEs) are run on. This can be helpful, for example, if you know multiple operators in your application require significant cpu or memory. In this case you might want to make sure they run on separate hosts, spreading the workload across the application nodes in your service instance. Or, you may want two operators to be in separate processes (PEs) but have them running on the same application node so the connections between them are local.

In the Streaming Analytics service, each of the application nodes assigned to your started Streams instance is assigned a “host<n>” tag. For example, if your service has two application nodes (the default) they will be “host1” and “host2”. The SPL language has config hostPool and placement : host clauses to control placement of your operators on specific hosts.

For example, the following configuration defines a host pool containing the application node tagged as “host1” and then specifies that pool to be used for placement of the operator.

config
    hostPool : Host1Pool=createPool({tags=["host1"]}, Sys.Shared);
    placement : host(Host1Pool);

For additional details about configuring operator placement on hosts see hostPool and placement : host config clauses.

Troubleshooting

Viewing Trace Messages

For troubleshooting you can see if the operators in your application are logging any messages indicating failures. For example, I submitted the TwitterStream job and I can see that no tuples are flowing through the graph. The most likely explanation is that a source adapter operator is not connecting to its source.

  1. Navigate to the Log Viewer in the console.
  2. Expand the tree view for the TwitterStream job to find the PE that contains the TwitterSource operator.
  3. Switch to the Application Trace tab in the log viewer.
  4. Click Load trace messages.
    Load application trace messages in Streams Console

A current snapshot of the trace messages for the operator is loaded with the newest messages shown first. You can see that the operator is receiving an HTTP status code of 401 which means “Not authorized”.

Application trace messages displayed in Streams Console

By default only error messages are included in the trace logs. If the error messages don’t seem to have enough detail you can investigate further by adjusting the level of detail being logged. There is a trace level setting for every PE in the job, so you can change the trace level for individual PEs as needed.

  1. Return to the tree view in the Log Viewer
  2. Hover over the “i” information icon for the operator (or PE)
  3. Click the Set Application Trace action
    Set Application Trace action in Streams Console
  4. In the Set Application Trace Level dialog change the Trace Output Level to Information.
    Set Application Trace Level dialog in Streams Console

The log view does not automatically refresh its contents like the Application Dashboard. Wait a couple minutes for the operator to retry connecting to the server and click the Reload link above the log messages.

Detailed application trace messages displayed in Streams Console

In addition to the HTTP 401 status code, the operator logged a warning message indicating the error was due to an authentication error. In this case, canceling the job and resubmitting it making sure to correctly copy and paste the Twitter credentials fixed the problem.

Tip: If you want to change the trace level for all PEs in the job you can use the Set Application Trace action on the menu for the job.
Tip: If you need to see detailed trace messages when the PEs are initially starting, you can set the trace level on the job submission dialog.Specify trace level in Submit Job dialog

Viewing Console Log Messages

In addition to trace messages, operators can also write messages to the console log, which is the standard output for the process. Unlike tracing, console logging has no levels of messages and the output is completely free-form. The Smackdown application uses a Custom operator to write result messages to the console.

  1. In the Streams Console navigate to the Log Viewer.
  2. Expand the tree view for the Smackdown job to find the PE that contains the AggConsole operator.
  3. Switch to the Console tab in the log viewer.
  4. Click Load console messages

A current snapshot of the trace messages for the operator is loaded with the newest messages shown first. The console will contain the result messages that are printed every minute, one for each search term.

Console messages displayed in Streams Console

 

Downloading All Logs For A Job

You can capture a snapshot of all logs for a job to download to your computer. This can be handy for searching across the logs for a large number operators in a job.

  1. Navigate to the Streams Tree view in the Application Dashboard.
  2. Expand the tree view to the job for which you want to download logs.
  3. Hover over the “i” information icon for the job
  4. Click the Download Job Logs action
    Download job logs action in Streams Console

A pop-up window should appear showing that a request was made to collect the logs for the job. When the request is complete a gzipped tar file will be download through the browser.

Additional Resources

Read more about about some of the special considerations when developing your Streams application for the cloud in Getting your SPL application ready for the cloud.

8 comments on"Bluemix Streaming Analytics Development Guide"

  1. Joel Rongiard September 19, 2016

    Hi,
    I am not able to run this Twitter sample streams applicaton on bluemix…
    It works fine in my streams developement environnement (4.1.1.0), i can see the tuples coming from twitter.
    But when I submit the job on my bluemix streams, it starts but failed immediately. After i can see up to 10 launch count, then the status becomes unhealthy…
    There is no logs/traces at all.
    Then I tried a very simple streams app with only one SPL operator… and got exactlety the same failure running in bluemix while it works perfectly in my developement env.
    So it seems that someting is wrong in my development set-up or when my .sab is generated.
    Can you help me ?
    Thanks.

    • Hi Joel, First thing, do you have Streams 4.1 installed on RHEL 6.5 or equivalent CentOS version? Streams apps compiled on RHEL 7 will not work.

      If that’s not the problem, I can try to look into it further. With the job submitted and not working, let me know the Bluemix region you are using (e.g. us-south, eu-gb) and the your Streaming Analytics instance ID (GUID), or just the first several characters of it.

      If you wouldn’t mind taking the discussion the Streamsdev questions, that would allow others to see the problem and solution.

      DW Answers for Streamsdev: https://developer.ibm.com/answers/smartspace/streamsdev/

      • Joel Rongiard September 21, 2016

        Thanks Paul, you are right, I run since many month on RHEL 7… so I miss the RHEL 6.5 indication !
        so this morning I installed a new dev environment on RHEL 6.5 and it works perfectly…
        Thanks again

  2. can u please help me out for these downloaded job logs to json format and then these json data to be visible in hadoop hdfs of streamsadmin terminal

    • Hi, I don’t understand what you are asking. Can you be more specific about what you are trying to accomplish? Are you using the sample application from this article with the Streaming Analytics service in Bluemix?

  3. Hello pvallen,
    Yes,I am using this sample application.I want a detail explanation on how to make use of this sample application further,simply how this sample twitter data load into hadoop hdfs by make use of HDFS toolkit.And this data should be shown in terminal itself Is it Possible? detail explanation is needed.
    Thank You

  4. Is there any possibility of using the same procedure by taking facebook data source.Please explain in detail.

    • I’m not familiar with Facebook’s public feed access. A quick glance shows that rather than providing an HTTP API endpoint, they will post public status updates to an HTTPS endpoint on your server. From there you could feed the data into a streams application.

      From the FB graph API page [https://developers.facebook.com/docs/public_feed/]:
      “In order to start receiving updates from the public feed API, you will need to setup a HTTPS endpoint, where the API will POST all updates. Please note that this endpoint must have a valid SSL certificate that is NOT self-signed in order to receive updates. Once your HTTPS endpoint has been configured, provide this URL to your Facebook representative and confirm that you’re ready to start receiving the large volume of data that will be sent through the public Feed API.”

Join The Discussion