Introduction of the concept of a domain

A domain is a logical grouping of resources in a network for common management and administration.  A domain can contain one or more instances that share a security model, such as PAM or LDAP.  You must create at least one domain in InfoSphere Streams Version 4.0.  For more information about domains, see: http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.welcome.doc/doc/domains.html

To move your existing applications to a Version 4.0 installation, you must first install InfoSphere Streams v4.0.  An instance upgrade tool is provided to assist in migrating v3.2.1 instances to a v4.0 domain, instances, hosts, tags and properties.  Before proceeding with a migration, thought should be given in regards to if a v3.2.1 configuration should be migrated to v4.0, or if the advantages of the new architecture of Streams v4.0 and the necessary installation and configuration changes needed, make it easier to start from scratch.  When starting from scratch, you can take advantage of Streams v4.0 tooling to create a new domain and instance(s).  Using the v4.0 tool will help to take advantage of the new features of v4.0.  A direct migration from v3.2.1 configurations will likely not take full advantage of the new v4.0 features.

To move to InfoSphere Streams Version 4.0, complete the following steps:

1. If you choose to use the migration tool, make sure that you are running Streams Version 3.2.1 instances before you begin.  The tool supports migration from Streams Version 3.2.1 to Version 4.0 only. The migration tool download package includes documentation that walks you through the steps to use the tool.  The migration tool is most helpful for customers that wish to stay with pre-v4.0 requirements of using SSH, shared file system installs, are fine with setting up all Streams users on all hosts of the domain, and having limited support for Streams high availability.  If you choose to not use the migration tool, you can build a new Version 4.0 setup and configure it using tools such as the Domain Manager, Streams Console, and the streamtool command-line interface.  The steps below will guide you through that process.

2. Read and understand the new concepts and guidelines for Version 4.0.  This will help you plan the number of hosts you will use for your new InfoSphere Streams domain and how you want to place your system services and applications on those hosts to take advantage of the new high availability features and security features of v4.0.

Version 4.0 is designed to provide high availability for production environments.  You will need to consider how many hosts you want to use for your InfoSphere Streams applications and how many hosts you want to use for Streams management services.  This planning will help ensure the ability to recover system services and applications, should any failures occur.  Planning in advance will also help improve the performance of your applications and Streams management services.

In addition to deciding which hosts you will use for your InfoSphere Streams domain and instances, the following tasks must be completed before you install InfoSphere Streams:

  • Set up an Apache ZooKeeper installation that InfoSphere Streams can use.¬† This is a new requirement for Streams v4.0.¬†More information about ZooKeeper is provided in the next section of this document.
  • Choose and set up your preferred authentication mechanism.¬† InfoSphere¬†Streams supports authentication via LDAP or PAM.

For detailed InfoSphere Streams planning information, see:

http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.install.doc/doc/ibminfospherestreams-plan.html.

For details on performance best practices for Streams configurations, see the IBM InfoSphere Streams v4.0 Performance Best Practices document.

3. For an InfoSphere Streams production system, you must have an Apache ZooKeeper installation available before you install Streams v4.0.

Choose an existing ZooKeeper installation or download and install ZooKeeper.  Apache ZooKeeper is a software project of the Apache Software Foundation, providing high performance coordination service for distributed applications.  For details about ZooKeeper, go to the ZooKeeper website at http://zookeeper.apache.org/.  Follow the guidance below for configuration recommendations for ZooKeeper usage by Streams.

Prior versions of InfoSphere Streams used IBM DB2 to store information required for recovery.  Streams v4.0 uses ZooKeeper instead of DB2 to store recovery information.  ZooKeeper is also used to store configuration and state information needed for service and application high availability.  Due to the critical data stored in ZooKeeper, its a good idea to have redundant ZooKeeper services.

Streams Version 4.0 also works with an embedded version of ZooKeeper that is included with the InfoSphere Streams installation.  Whether using an external ZooKeeper for production environments, or an internal ZooKeeper for development and test environments, ZooKeeper will use additional memory and processor time on the hosts where  ZooKeeper is deployed.  If you were using IBM DB2 as a recovery database, and now switch to ZooKeeper, you should not experience differences in load on your systems.

For performance measurements, performance recommendations, and performance reference architecture, see the IBM InfoSphere Streams v4.0 Performance Best Practices document.

Embedded ZooKeeper

For an environment where high availability of Streams services and your Streams applications is not a concern, InfoSphere Streams supports an embedded ZooKeeper option.  Embedded ZooKeeper simplifies the prerequisite for InfoSphere Streams.  The user does not have to set up a ZooKeeper instance, and does not need extensive knowledge about managing a ZooKeeper instance.  InfoSphere Streams will configure a private ZooKeeper instance, start it, and use it when InfoSphere Streams is started.  The primary use for embedded ZooKeeper is a single node developer environment, and not a production environment.

Running a single node ZooKeeper (embedded or external) is not the preferred option for a production environment since it does not provide ZooKeeper service failover if the single node would fail.

External ZooKeeper

InfoSphere Streams allows users to configure an externally managed ZooKeeper.  In an environment where the user wishes to have a highly available InfoSphere Streams setup, the preferred option is to configure an external ZooKeeper ensemble.  Usage of external ZooKeeper support allows the user to configure a ZooKeeper cluster to provide a scalable, highly reliable ZooKeeper installation for InfoSphere Streams to access. InfoSphere Streams relies on the user to manage the ZooKeeper ensemble, and does not provide any additional tooling to help with managing ZooKeeper.

For external ZooKeeper configuration guidelines for InfoSphere Streams, see:

http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.cfg.doc/doc/configuring-external-zookeeper.html.

For a Streams production environment , the preferred option is external Zookeeper server. See the IBM InfoSphere Streams v4.0 Performance Best Practices document for recommended configurations.  If you have an existing ZooKeeper v3.4.6 or above installation, you can use it for InfoSphere Streams v4.0.

The ZooKeeper ensemble connection string will be required when you configure your InfoSphere Streams domain.

Troubleshooting

To troubleshoot issues with embedded or external ZooKeeper, see:

http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.pd.doc/doc/containerstreamszookeeper.html.

4. Decide whether you want InfoSphere Streams to continue to use a shared file system, and have SSH enabled between your hosts, or if you should move to the preferred setup where shared file systems and SSH are no longer needed.

To take advantage of new functionality, a more reliable and highly available environment, and increased security as new product releases become available, the preferred option is to move away from using SSH with a shared file system to the new solution of running the domain controller as a system service.

Running the Streams domain controller as a system service:

  • The installation and setup must be done by user root.
  • This provides automatic recovery from host failures.¬†When the host is restarted, the Streams domain controller service (controller) will be started and immediately connect to the domain.¬† At this point, the host will be available to run InfoSphere Streams services, and Streams service recovery will begin.¬† If the domain is not created or started when the controller is registered as a system service, the controller service will be dormant, waiting for the domain to be created and started.
  • The “streamtool registerdomainhost” command is used to register and create the system service.¬† There is also a streamsdomainhostsetup.sh convenience script that will install the product and perform the streamtool¬†registerdomainhost.
  • If you are installing to each host to set up the controller as a system service, there are no installation restrictions in regards to installation path or installation owner.
  • For more information see the Production Environment in the considerations for setting up a multi-host environment topic see:

    http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.install.doc/doc/ibminfospherestreams-install-options-configuration.html.

Running with SSH support between hosts and a shared file system:

For both of the above environment setups, the controller service will be auto-restarted if it fails unexpectedly. The Streams code will detect when controllers are no longer running and restart the controller without manual intervention.

For information on how to set up the domain controller to run as a Linux system service see:

 http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.cfg.doc/doc/setting-up-enterprise-domain.html.

5. Install the InfoSphere Streams software on your main host using the main installation package.

There are two installation packages for InfoSphere Streams v4.0.  The installation package shipped to customers is for a full product installation.  This package needs to be installed on at least one host of the domain, and should also be installed on every host you would like to use for developing Streams application or where you need to use the embedded ZooKeeper.  The full product installation package is also used to generate the domain host installation package.  The domain host installation package is a subset of the product, and contains the Streams software necessary to register and start the controller system service.  This install package reduces the size of the product on each additional host, and is only used when not using a shared file system to share Streams code between hosts.  The Streams controller provisioning code will automatically install additional Streams code on hosts when the code is needed by Streams services, and the provisioning service will also update the host as fixes are installed on the main installation host.  When not installing the full product installation to the shared file system, the domain host installation package need to be installed to each additional host you want in the domain.  For more details on the packages see:

http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.install.doc/doc/ibminfospherestreams-install-programs-packages.html.

Running with SSH setup and shared file system

If you are running with SSH enabled between hosts, and with a shared file system, you can either install the main installation package to a shared file system directory or to each host.¬† If you choose to install to each host, you need to follow the restrictions in the “Development or test environment” section here:

http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.install.doc/doc/ibminfospherestreams-install-options-configuration.html.

For any Streams installation option, run the dependency checker script in the main installation package to verify the system configuration and the software dependencies are met on each system that will be part of the domain. For information about system and software dependencies see:

http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.install.doc/doc/ibminfospherestreams-install-prerequisites.html.

To perform the installation of the main installation package see:

http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.install.doc/doc/ibminfospherestreams-install.html.

Running domain controller as a system service

If you run the Streams controller as a system service, you will install the main installation package on at least one host. Before doing the installation, see considerations for setting up a multi-host environment:

http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.install.doc/doc/ibminfospherestreams-install-options-configuration.html. 

Run the dependency checker script in the main installation package to verify the system configuration and the software dependencies are met. For information about system and software dependencies see:

http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.install.doc/doc/ibminfospherestreams-install-prerequisites.html

To perform the installation using the main installation package, see:

http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.install.doc/doc/ibminfospherestreams-install.html.

Setup

After you install the main installation package on the main host, you will need to setup and configure a domain.

If you chose to run the domain controller as a system service, and you did not install to a shared file system, you will create the domain host installation package and install this on each additional host in the domain. You do not need to run this installation and setup on the host where you installed the main installation package.  The external ZooKeeper must be setup prior to creating the domain host installation package. The domain does not need to exist yet. You will be required to specify the ZooKeeper connection string and the domain name on the command to create the domain host installation package.  See Step 6 for more details on creating the domain host installation package and running the setup on additional hosts.

If you chose to run the domain controller as a system service, and you are installing to a shared file system directory, you can just run the streamtool registerdomainhost command from each additional host to register the host with the domain, and create the Linux system service configuration.

You should also run “streamtool registerdomainhost” on the hosts where you did a main host installation package.

After the installation is complete, some post-installation steps are necessary.  The post-installation steps are outlined in the post-installation roadmap at:

http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.install.doc/doc/ibminfospherestreams-install-postinstall-roadmap.html.

Provisioning

When running the domain controller as a system service, only a subset of the product code will be installed to the additional hosts. The rest of the code will be provisioned to each host as needed. For example if a host is tagged to run the SWS service, the controller provisioning code will install the additional code needed for this service.

The provisioning code will also manage installing new product code, fixpacks and interim fixes to the additional hosts in the domain as needed.  You only need to install the new versions, fixpack or interim fix on each host where the main installation package was installed.  The provisioning code will detect that a new version, fixpack or interim fix was installed and manage updating the additional hosts and Streams services.

Automated installation

We no longer ship an RPM specification file. To automate the installation you should use Puppet or Chef.  For information on how to automate the installation see:

http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.install.doc/doc/ibminfospherestreams-install-automated-section.html.

6. Optional:  Run the instance upgrade tool.

To assist in updating Version 3.2.1 instances to Version 4.0 domains and instances, we have developed an upgrade tool.   The tool outputs commands and generates scripts that will create a basic working domain and instances with the same hosts and tags with upgraded properties and removal of deprecated support. It is recommended to only use this tool if you are staying with usage of SSH, shared file system installs, same users on all hosts of the domain, and somewhat limited high availability support.  The domain and instances will be created with the user running the command (owner of the .streams directory for instances being upgraded) as the owner.   Additional host service tagging for services such as SWS, JMX, and AAS will likely be needed after the domain is created.  The upgrade package and instructions can be found here: https://developer.ibm.com/streamsdev/docs/tool-upgrading-infosphere-streams-version-3-2-1-instances-version-4-0-0-0/.   If you chose to upgrade using the upgrade tool, skip to step 8 now.   You can also choose to manually create the domain, instances, hosts and tags if you do not want to use the upgrade tool.  The Domain Manager, Streams Console, and the following instructions will help assist in the setup.

7. Install all of the additional domain hosts using the domain host installation package. 

The domain host installation concept is new to InfoSphere Streams Version 4.0.  You only need to perform this step if you did not install the main installation package to a shared file system directory, you are setting up a multi-host environment, and you are running the domain controller as a system service.  This installation package only needs to be installed to the hosts where you did not install main installation package.

Using the main management host (where you installed the main installation package), you are going to create the domain host installation package using the streamtool mkhostpkg command. You will be required to specify the external ZooKeeper connection string and a domain name. The domain does not need to exist.  For additional information on the parameters see the streamtool man pages for this command.

The mkhostpkg command will create the domain host installation package. The domain host installation package contains the following key files:

  • dependency checker script — Run this to verify the host meet the product’s system and software dependencies. If software dependencies are missing, you should install them prior to running the streamsdomainhostsetup.sh script. If the RPM is shipped with the product, you will need to get the RPM from the main installation package.
  • domain host installation binary — Binary that will do the domain host installation.
  • streamsdomainhostsetup.sh — This script will perform the domain host installation, register the host with the domain, and set up the domain controller as a system service. It uses the responses in the response file that was generated when the domain host installation package was created.
  • response file — responses for performing the installation, registering the host with the domain, and setting up the system service.

For each additional host that you want in the domain do the following:

  • Copy the domain host package to the host
  • Untar the domain host package
  • Run the dependency checker script
  • Run the streamsdomainhostsetup.sh script.
    • Installs the boot strap code. The response for the installation come from the response file created by the streamtool mkhostpkg command
    • Runs the streamtool registerdomainhost command to register the host with the domain, creates and registers the system service, and starts the system service.

You can now create the domain if it does not already exist. When you list the available resources on the domain you will see the hosts that you registered with the domain. The hosts are now ready for Streams services to run on them.

If the host is used by more than one domain, you will need to register the host to each domain, thus setting up a system service for each.  You can use the same domain host installation and just need to run the streamtool registerdomainhost, and not the streamsdomainhostsetup.sh script.

You should also make sure to do the streamtool registerdomainhost on the host where you did the main installation, as that will ensure that host is protected if it fails.

For more information on creating the host package see http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.cfg.doc/doc/creatingpkg.html.

For more information about running the streamsdomainhostsetup.sh script see

http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.cfg.doc/doc/addingresources.html.

8. Review your installation for resiliency and performance.

9. Create your ACLs and consider security

User and group Access Control Lists (ACLs) are not  moved over by the upgrade tool.  In Streams v4.0 you can control access at a domain level, which allows you to set the permissions for multiple instances.  For more information, see:

http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.cfg.doc/doc/ibminfospherestreams-user-security-authorization.html.

The Authentication and Authorization Service (AAS) supports authentication of users (which is managed at the domain) and authorization of users to security objects which, in turn, controls permission to run a command or function. Authorization is managed at domains and instances and the usage of roles makes this easier to control user access to commands.

A role is a collection of of users and groups that can be assigned permissions. When a role has a permission then, all users or users that are members of groups that are members of the role, has that permission.  For more information, see:

http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.cfg.doc/doc/roles.html.

You can set permissions for job groups. A job group is a group of jobs that have the same authority or permissions. You can use job groups to limit who can perform tasks such as sending or receiving data from jobs, and stopping or restarting processing elements (PEs).  For more information, see:

http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.cfg.doc/doc/jobgroups.html.

Compared to prior releases, which performed minimal authorization checks, command and API authorization is designed into Streams 4.0 and the authorization checks are implemented throughout the support. For streamtool the following information details the authorization required to run a command:

http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.cfg.doc/doc/ibminfospherestreams-user-security-authentication-commands.html.

To successfully use generated authentication keys when a domain is running in an environment without a shared file system, the streamtool genkey command must be run on each host where the user will run streamtool commands.  For information about how to do this, see:

http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.cfg.doc/doc/ibminfospherestreams-user-security-authentication-rsa.html.

10. Update scripts that use streamtool

Most streamtool commands now require options to be specified for a ZooKeeper connection string, a domain ID, and an instance ID.  For convenience of not requiring users to always enter those options, when starting streamtool, the following environment variables will be checked and used if the appropriate option is not provided:

  • For the ‚Äďzkconnect command option: STREAMS_ZKCONNECT environment variable value will be used if set
  • For the -d or –domain-id option: STREAMS_DOMAIN_ID environment variable value will be used if set
  • For the -i or –instance-id option: STREAMS_INSTANCE_ID environment variable value will be used if set

The installation will append a version directory to the directory specified at install time.¬† When you source the streamsprofile.sh script you must specify the version directory in the path. For example if you installed to /home/streamsadmin/InfoSphere_Streams, when you source streamsprofile.sh you must do the following: “source /home/streamsadmin/InfoSphere_Streams/4.0.0.0/bin/streamsprofile.sh”. ¬† The installation path version directory will change with every release, modification release, and fixpack release.¬† The new version specific location of the Streams binaries and scripts may cause issues if you have scripts that call Streams executables.¬† If so, you should use Linux techniques such as creating a symbolic link to the Streams version directory.

By default, sourcing streamsprofile.sh in the Streams install bin directory will set the following environment variables:

  • STREAMS_INSTALL=~user/InfoSphere_Streams/4.0.0.0
  • ** STREAMS_DOMAIN_ID=StreamsDomain
  • *** STREAMS_INSTANCE_ID=StreamsInstance
  • Adds STREAMS_INSTALL directory to the PATH

  ** set to StreamsDomain only if the environment variable is not already set

*** set to StreamsInstance only if the environment variable is not already set or, (for compatibility) if the STREAMS_INSTANCE_IID is set

Make sure your scripts either source streamsprofile.sh or set the environment variables.

Run the streamtool genkey command once on each host where scripts will run streamtool commands.  Running genkey will create a private key to allow the bypassing of always needing to entery user/password credentials.

Many streamtool commands have been changed, added, deprecated, or discontinued.  For information about deprecated or discontinued functionality, see:

http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.welcome.doc/doc/ibminfospherestreams-whats-changed.html.

Streamtool also provides a new interactive mode and completion assistance.  For more information, see:

http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.ea.doc/streamtool/doc/streamtool.html.

11. Review deprecated and discontinued functionality

Review the information about deprecated and discontinued functionality at

http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.welcome.doc/doc/ibminfospherestreams-whats-changed.html.

  Note that many configuration parameters have been deprecated or discontinued.

12. Explore the new Streams Console

The Streams Console has been significantly improved in Streams v4.0.  For information about how to start the Streams Console, see:

http://www.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.admin.doc/doc/ibminfospherestreams-adminconsole.html.

 

Join The Discussion