10 17 2013 12 41 35 PM 300x300 Streams Lab Introduction

InfoSphere Streams

SPL Hands-On Lab, Lab Version 1.0

For Use with Streams QuickStart Version 3.2.1.0

Robert Uleman

Introduction – SPL Programming

The Streams Processing Language (SPL) is the Streams applications composition and flow language. Key features of the language include:

   Stream-centric, operator-based language – Stream processing declarative language supporting incremental development and composition, dynamic connectivity, and a rich expression language.

   Growing base of toolkits and operators – Extensible through the addition of toolkits and operators, able to support arbitrary data streams at high volume. Toolkits include: Internet, Database connectivity, Text Analytics, HDFS Adapters, OpenCV graphics, and many more.

   User-defined operators – API included to extend language and toolkits. Supports reusing existing analytic code written in C++ or Java. Write new operators and toolkits to encapsulate domain-specific logic, data types, and adapters.

Goal and Objectives

The goal of this SPL Programming lab is to provide a solid foundation of SPL, including:

   the expression language

   standard operators

   data types

   stream processing concepts such as windows and punctuation

   toolkit development

The objectives of this lab (along with the accompanying presentations) are to:

   Use a pseudo-real world problem space to explore the SPL language and stream computing

   Utilize best practices and an iterative development approach to developing SPL applications

   Demonstrate the power of the SPL expression language, which can reduce the need to drop into C/C++ or Java to develop native functions

Prerequisites

This lab was designed to follow the Streams 3 Introductory Hands-On Lab, which demonstrates many of the features of the Streams Studio IDE. This lab assumes that students have a basic understanding of the IDE and are able to navigate it.

Overview

The purpose of this hands-on lab is to provide a practical approach to learning SPL through a series of hands-on exercises, which build upon one another.

2014-04-15 06_40_23-Streams 3 SPL Lab Guide 2.0akev.htm [Compatibility Mode] - Microsoft Word

Figure 1 – SPL Programming Lab Overview

The lab is based on a fairly simple example of processing GPS location data.

Figure 1, above, provides a graphical overview of the SPL artifacts that you will develop through this lab. They include:

   An SPL Toolkit: streamstk.gps

   A GPS device-like data generator that can run as a standalone Linux executable

   A location summary SPL Application, reading data from a TCP socket

The lab is broken into 4 parts:

Lab 1 – Hello World: a review of SPL program creation, compilation and execution in the Streams Studio Integrated Development Environment. A command-line version of the same exercise is in the Appendix.

Lab 2 – SPL Expression Language, Functions, Types and Toolkits: Build an SPL toolkit of data types and SPL functions to facilitate development of more complex applications and promote reusability.

Lab 3SPL Adapters: TCPSource and TCPSink. Use the SPL language to develop a standalone data generator and a Streams application to receive streaming data over a socket connection.

Lab 4Aggregation and Sort: Use two standard-toolkit operators to develop a location summarization analytic and see how fast it can run on the lab hardware.

 

Lab Environment

This lab does not require a specific VMware image. The labs in this guide can be accomplished on any Streams environment meeting the following minimum requirements:

  • IBM InfoSphere Streams 3.x
    • The instructions in this lab are based on version 3.2.1.0. There may be minor changes to the Streams Studio steps in other versions.
  • Red Hat Enterprise Linux (RHEL) 5.5 or later, CentOS Linux 6.1 or later, SuSE Linux Enterprise Server (SLES) 11.2 or later
  • Streams Studio 3.x or Eclipse (3.6 or 4.2, depending on the Streams version) with InfoSphere Streams Studio plug-ins installed
 10-17-2013-12-58-06-PM NoteThere are a few steps in the lab that refer to specific usernames and the host name of the lab image. Please modify these as appropriate for your environment.

If this lab is run using the same VMware Image that was used for the InfoSphere Streams 3 Introductory Hands-On Lab, then the following usernames and passwords are included:

Table 1. Streams 3.x lab – virtual machine information

Parameter Value
Host name streamslab or streamsqse or bigdata
(it does not matter)
User and administrator ID streamsadmin
User password passw0rd (password with a zero for the O)
root password passw0rd

Start Lab 1

Join The Discussion