In Streams V4, some of the major enhancements to the product has impact to SPL application migration. ¬†In this document, we are going to discuss what’s changed, and how it affects your SPL application. ¬†We will also discuss the steps required to successfully migrate your SPL application from previous release to Streams V4.

Refer to the following migration document for details:  Migrating streams processing applications to InfoSphere Streams Version 4.0

Application Bundle

Prior to Streams V4, SPL applications are not easily relocatable, such that when an application is compiled on one system, it is difficult to run the application on another system.  The Application Bundle feature in Streams V4, is intended to make Streams application more relocatable, allowing you to deploy your application from development to production more easily.

To achieve relocatability, the SPL compiler collects all the files required for executing the application and build an application bundle (*.sab) file to be submitted to streams instance.  The application bundle will also include the toolkits that the application depend on.  For information about application bundle, see this documentation:  Application bundle files

How to Migrate?

To ensure that your applications will continue to work, you need to review all file resources in your application, and make sure that they are available at runtime.  For each of the files, you need to understand if a file is a data file, or configuration / metadata file.  For data files, it is recommended that these files are placed in the data directory.  For configuration / metadata files, it is recommended to place them in under the etc directory.  Alternatively, you may place them in any of the default directories as listed in the documentation.  If one of your application directories is not part bundled by default, you may add it to be part of the bundle by modifying the info.xml of your application.  For more information on how to add artifacts to the application bundle, refer to this documentation:  Application bundle files

Removal of Default Data Directory

Prior to Streams V4, a data directory is mandatory for a Streams application. ¬†This directory is created by default if no data directory is specified and a “data” directory does not exist under the application root directory. ¬†In Streams V4, there is no default data directory. ¬†Streams will no longer create a data directory automatically. For more information about data directory, see this documentation: ¬†Removal of Data Directory

How to Migrate:

To migrate you application, you need to determine if your application requires a data directory.  If a data directory is needed, create the data directory on all hosts where the application can run.  If you have been using the <application>/data directory to store your data files, you  will need move those files to the data directory that you have created.  You will also need to specify the location of your data directory at compile time or job submission time.

Changes in Working Directory at Runtime

Prior to Streams V4, when the SPL application is being executed, the application working directory was set to the data directory.  In Streams V4, the working directory is set in special location under <home-directory>/.streams/var.  The working directory is writable, but you may not retrieve any files from this location.  If your application relies on the data directory being the current working directory, and uses that as the assumption to handle relative paths, your application will no longer work correctly in Streams V4.  For more information about this change, see this documentation:  InfoSphere Streams default directory for application bundle files

How to Migrate:

Your application is affected by this change if  the application relies on the data directory to be the current working directory, and uses relative paths to reference to files.  To ensure that your application continues to work, review your code and find all instances where a relative path is specified.  You need to determine if the files referenced are data files or configuration files.

  • If it is a data file, move the data file to the data directory. ¬†Instead of using the relative path, you can construct the path using the following API: ¬†dataDirectory()/[filename]
  • If it is a configuration/metada file, make sure the file is placed in a directory that will be included as part of the application bundle. ¬†Instead of using the relative path, you can construct the path using the following API: getThisToolkitDir()/etc/[configuration file]

Specialized Toolkit Changes to Support Application Bundle

All of the specialized toolkits in Streams V4 have been updated to support the application bundle feature.  As a result of this support, any parameter that is related to specifying a file or directory path has changed.

In general, the specialized toolkits handle file-related parameters in the following manner:

  • For parameters that look for data files, the parameters support both absolute and relative paths. If a relative path is specified, the path is relative to the root of the data directory.
  • For parameters that look for non-data files, such as connection document or configuration files, the parameters support both absolute and relative paths. If a relative path is specified, it is relative to the root of the application directory.

How to Migrate:

To migrate, you need to review the your application and determine if it is using one of the affected operators.  Follow the instructions from previous section to update any file-related parameters.  For more information on how Specialized Toolkit parameters are changed as a result of application bundle, refer to this documentation:  Migrating streams processing applications that use relative file paths

Changes to the Specialized Toolkits

Some of the operators and functions from the specialized toolkits have been deprecated, moved or discontinued.

How to Migrate:

Refer to the following document to see if your application is using one of the operators or functions that are affected.  Refer to toolkit document to find replacement functions or steps to migrate:  Changes to the InfoSphere Streams specialized toolkits

Operators are now Restartable by Default

To improve application resiliency, operators are now configured as restartable by default.  If this is not suitable for some of your application operators, you will need to manually configure those operators to be non-restartable.