Attention: This article is obsolete. The general concepts still apply but the specific sample and code are no longer being maintained.
IBM Streams (â€śStreamsâ€ť) enables continuous and fast analysis of massive volumes of moving data to help improve the speed of business insight and decision-making. Streams provides an execution platform and services for user-developed applications that ingest, filter, analyze, and correlate the information in data streams.
With IBMÂ® Streaming Analytics for Bluemixâ„˘, you can perform real-time analysis on data in motion as part of your Bluemix application. The Streaming Analytics service is powered by IBM Streams.
This article describes a demo application consisting of two parts: a Streams job running in the Streaming Analytics service that reads RSS feeds containing lists of articles from Computerworld and analyzes the article content using text analytics; and a Bluemix Liberty for Java application used to view the data produced from the Streams job.
What the Demo Shows
The demo can be seen running at Streams RSS Demo View Application
The demo reads from a set of Computerworld RSS feeds. The feeds contain a list of articles. Each article is processed using text analytics operators and the results are displayed in a browser. Here is a description of each section of the user interface. There are 4 windows displayed in the browser:
- Upper left.
This is a wordcloud of the corporation names found in the latest article processed. The extraction is done using a pre-built AQL extractor. The larger the name, the more often it is found in the article.
- Upper right.
This window shows a wordcloud of people names found in the article as defined by the pre-build AQL extractor that comes with the text analytics tooling. The larger the name, the more often it is found.
- Lower left.
This window displays the names of up to 10 articles that have been processed. As more articles are processed, older article titles fall off the list.
- Lower right.
This window shows sentences that appear in the articles and include specific words. These words are part of a dictionary in an extractor.
Details of the components
The Streams Application
The Streams SPL application uses an operator to continuously loop through a list of 10 Computerworld RSS feeds:
"http://www.computerworld.com/category/big-data/index.rss" "http://www.computerworld.com/category/cloud-computing/index.rss" "http://www.computerworld.com/category/data-center/index.rss" "http://www.computerworld.com/category/emerging-technology/index.rss" "http://www.computerworld.com/category/enterprise-applications/index.rss" "http://www.computerworld.com/category/it-management/index.rss" "http://www.computerworld.com/category/mobile-wireless/index.rss" "http://www.computerworld.com/category/networking/index.rss" "http://www.computerworld.com/category/operating-systems/index.rss" "http://www.computerworld.com/category/vertical-it/index.rss"
The application also demonstrates how easy it is to create a custom Java operator. In this case, the application includes two custom operators: One that reads an RSS feed, and another that extracts the proper content from an HTML file. The latter also uses an outside jar file, JSoup-1.8.3.jar to make it easier to manipulate HTML.
Each feed generates roughly the last 20 assets (articles, videos, etc) produced by Computerworld in that category. For each item returned from a given feed, the application reads the HTML file and extracts the article content before passing it on to the text analytics operators. These operators extract the following:
- Corporation and people names
These are used in the two wordclouds mentioned earlier
- Sentences containing key terms
The key terms are: “ibm”, “streams”, “biginsights”, “watson”, “spark”, “cognos”, and “spss”.
Since these key terms are kept in an AQL external dictionary, it is easy to change to tailor
the demo to your needs.
The combined results are sent using a
HTTPPost operator to the Bluemix Liberty for Java application.
The Bluemix Liberty for Java Application
The application implements two rest API’s
- The first receives the HTTP Post data from the streams application in JSON format and caches it.
The application source is provided in a project at StreamsRSSDemo
Instructions for deploying and running the Bluemix application and a pre-built streams application bundle file can be found at: Streams RSS Demo README
Instructions for installing and running the Streams Application in an on-premise (Streams Quick Start Edition VM) can be found at: Streams Application in the RssDemoDocumentation pdf file.
Instructions for modifying the on-premise application for use in the cloud can be found at: Streams Application in the Streams RSS Demo in the Cloud pdf file.
This demo application shows some of the power of IBM Streams applications along with the ease of deploying those applications to the cloud in the Bluemix Streaming Analytics service and leveraging other Bluemix services for a complete solution.