Introduction

This blog article presents various tips and tricks for helping to resolve issues that you may experience when deploying IBM Integration Bus (IIB) on Amazon Web Services (AWS), using the implementation described in this initial entry. The material in this blog was collected during the development of the IBM Integration Bus Quickstart and it therefore presents first hand some of the issues that you may experience when working in the AWS environment.

Even if you are not experiencing problems with the deployment today, it is still worth reading the article as it will give you more insight into some of the detailed workings of a CloudFormation deployment in AWS.

A range of topics are covered, including the different types of logging available, techniques for incremental testing, changing the health monitoring and dealing with constantly terminating instances. To start with though, let us quickly recap what the architecture is and the components that are created.

The CloudFormation templates

IBM Integration Bus is deployed in the following architecture using AWS CloudFormation templates.

IBM Integration Bus deployment architecture on AWS

One of two templates is used:

  • The master template (iib-master.template): this is used to deploy IIB in a new VPC. The template thus creates the entire environment needed for this deployment: it builds a VPC stack, which contains two public and two private subnets, spanned across two availability zones. It also creates the Bastion Host in one of the public subnets, as well as the IIB EC2 instance, in one of the private subnets.
  • The secondary template (iib.template) is used to deploy only the IIB Stack in an existing VPC. The template assumes that you already have a VPC which contains two public and two private subnets, as well as a Bastion Host. It then uses those to create the main IBM Integration Bus stack.

Using these templates to create a stack on CloudFormation would create stacks similar to those in the figure below:


(click image to view full size)

If you use the master template then that will create three nested stacks: the VPC stack (IIB-DEMO-VPCStack-123SGNC3Q6M0E), the Bastion Stack (IIB-DEMO-BastionStack-1BLYXU8AXY6V3), used to securely connect to the IIB server instance, and finally the IIB stack (IIB-DEMO-IIBStack-1NXPEYLXGHVMJ), which is the stack running IBM Integration Bus and IBM MQ.

If you use the secondary template then only the IIB stack will be created. The VPC and Bastion Host would already exist.

More information about how to deploy IBM Integration Bus on AWS using these templates can be found in this article.

Sources of information

A range of logs exist which can be used to investigate your instances and to troubleshoot them if necessary. To access these logs, you can either log in to the machine using SSH and find the relevant logs on the instance, or use the logs provided on CloudWatch, an AWS service which offers monitoring and central logs for your deployed resources.

The CloudFormation templates used in this implementation are configured to provide the following logs:

The Bastion Stack logs

These can be found on CloudWatch. To access them, choose CloudWatch from the AWS Services Console and you will be prompted with a page that looks similar to the figure below:


(click image to view full size)

Then, navigate to the Logs section, which can be found on the left-hand side panel of the page displayed above. From there, choose the log stream which contains your Bastion stack’s name, followed by BastionMainLogGroup, as displayed in the figure below:


(click image to view full size)

This log is provided by AWS through the template used to create the Bastion Stack and it contains all the SSH logs from the Bastion Host.

The IIB instance logs

The EC2 instance running IBM Integration Bus contains multiple logs which can be used for troubleshooting. These logs are explained below:

  1. General logs: the /var/log directory contains a range of logs that capture the outputs of the commands which run during the instance initialisation and configuration. This also includes the outputs of the scripts that are used to configure IBM Integration Bus and IBM MQ on the instance. The following logs can be found in the above-mentioned directory:
    • cfn-init.log
    • cfn-init-cmd.log
    • cfn-wire.log
    • cloud-init.log
    • cloud-init-output.log
    • syslog

    The cfn-init-cmd.log and syslog files are uploaded to CloudWatch, as they can be useful in verifying the instance configuration. The cfn-init-cmd.log contains the output from configuring IBM MQ and IBM Integration Bus on the instance. Apart from that, the syslog contains an extensive log of all events happening on the instance. At a minimum, it is recommended that you check the cfn-init-cmd.log when first logging in to an instance, to verify that the instance has been properly configured.

    These logs can be found under IIBMainLogGroup on CloudWatch, with a name built as follows: <instance ID>-cfn-init-cmd and <instance ID>-syslog. For the stack deployment presented in the previous section, the following logs can be found on CloudWatch:

  2. (click image to view full size)

  3. MQ logs: The queue manager records a range of logs which can be used for troubleshooting. Logs are available in two locations:
    • /var/mqm/errors/ – this directory contains AMQERR01.log, AMQERR02.log and AMQERR03.log. These are the installation level logs of the MQ environment.
    • /HA/mqm/qmgrs/<<queue-manager-name>>/errors/ – this directory contains: AMQERR01.log, AMQERR02.log and AMQERR03.log, which hold information, warning and error messages that are specific to the queue manager.

    Those logs can also be found in the IIBMainLogGroup on CloudWatch.

Health checking logs

The /HA/log/healthchecks.log file contains a record of tests which failed the health checks. These would have caused the termination of the IIB EC2 instance. Please note that this log is not created until the instance has been terminated at least once due to a health check failure. This log looks similar to the figure below:


(click image to view full size)

This example shows that the instance was terminated at 12:24:05 due to the fact that the queue manager was not running.

As with the other logs, the /HA/log/healthchecks.log can be found in the IIBMainLogGroup on CloudWatch.

You can also upload additional logs which you would like to monitor on CloudWatch. A guide on how to do that can be found here. Additionally, a model of how the current logs are uploaded to CloudWatch is provided in the template used to deploy the IIB instance. That is achieved in the LaunchConfiguration part of the template, under AWS::CloudFormation::Init. The current template can be found here and it can be used as a starting point for providing additional logs.

Techniques for problem resolution

Use all the various services provided on AWS

The AWS Console offers a wide range of services which can be used to extend this deployment and debug any issues you might encounter. Two of the most useful resources that can be used are covered here:

  1. EC2 – this service can be used to monitor most of the resources created by this deployment. Depending on the error you encounter, or on the area you would like to extend, the following resources can be found and edited through the EC2 console:
    • Instances
    • AMIs
    • Security Groups
    • Elastic IPs
    • Key Pairs
    • Launch Configuration
    • Auto Scaling Groups

    Using the console in this way can be particularly useful for performing simple changes, like adding name tags to a running instance for example, without having to change the template or code and rerun some steps. The console also gives you the ability to look at the state and properties of resources.

  2. CloudWatch – this is a very useful service which can be used to collect various data and logs from your instances for central viewing through the AWS console, without the need to log in to the machine. A key benefit of this facility is that copies of the logs are retained even after an instance has been stopped or terminated, so there is the ability to view historic events, something that would not have been possible unless you explicitly copied the logs to a shared resource such as S3 or EFS.

Quickly test incremental changes to the deployment

The quickest way to test changes to your Integration Bus configuration is to deploy the stack into an existing VPC. To do so, make sure you create your own S3 bucket and follow the same directory structure as the one in the iib-fast-deploy-aws sample bucket, from the ibm directory onwards. More information about how to achieve that can be found in this article, in the Deployment Steps section, Step 2. Then, you will need to upload your latest version of iib.template to the ibm/iib/latest/templates directory and use its link to deploy just the IIB Stack in an existing VPC.

Note: this can only be used for the changes which do not apply to the VPC and the Bastion nested stacks, as those are created by the master stack template.

Pin-point the stack creation failure reason

As default, a failed stack creation rolls back all the deployed resources. However, this can make it hard to identify the reason why the stack creation failed. To overcome this, it is often useful to set “Rollback on failure” to false, so you can find which of the nested stacks fails to create and at exactly which point. This can be done when creating a new stack, in the Options page, by choosing the Advanced section and set “Rollback on failure” to no, as shown in the picture below:

Check that the instance was configured properly

Every time an IIB instance is created, it is configured on two separate levels: the configuration required for the IBM MQ queue mananger and the one required for the IBM Integration Bus integration node. To quickly verify that both products have been configured and are running, you can check the contents of the cfn-init-cmd.log. This can be found in the /var/log directory on the iib_server instance and contains the output of these two configurations.

Disable health checking

This implementation contains extensive health checking to ensure the high reliability and availability of the data stored in the queue manager and the integration node. However, you might wish to make changes to the current deployment and to the configuration of the queue manager and of the integration node that include stopping and restarting components. In this case, health checking might interfere with your tests, as it can constantly mark your instance as unhealthy if your new configuration fails the existing health checks. A detailed explanation of the health checking employed by this deployment can be found in the next section of this article.

If you decide to turn off health checking, this can be done from the configure-iib-aws and configure-mq-aws scripts, by removing the following lines from the script:

Additionally, health checking is in place at the Elastic Load Balancer level, which ensures that port 1414 is listening. This is the port used by MQ clients to connect to the queue manager. This checking is enabled in the template used to create the IIB Stack and can be removed or modified from the following lines in the iib.template:

Another way of stopping your instances from being terminated by health checking is to detach your instance from the Auto Scaling Group. To do this, navigate to the Auto Scaling Group resource from the left hand side panel in the EC2 console. Then, select the auto scaling group corresponding to you IIB instance, as shown in the figure below:

In the first tab, click Edit and change the minimum instances number to 0. Then, change to the instances tab, select the IIB instance, right click on it and select Detach. This ensures that even though your instance becomes unhealthy, it will not be terminated, as it is no longer part of an auto scaling group.

Troubleshooting

Dealing with insufficient resources

Problem description: The stack creation fails with an error which implies that you have reached the limit for a particular AWS resource. This can be the VPCs, the number of elastic IPs, etc.

Cause: Every account has a limited amount of resources available in a particular region. You can find more information on account limits here.

Solution: A quick fix for that is to deploy your stack in a different region. This deployment supports five different AWS regions: us-east-1, us-east-2, us-west-2, eu-west-1 and ap-southeast-2.

Constantly terminating EC2 instances

Problem description: The EC2 instance hosting the integration node and queue manager is constantly failing the heath checking tests running on the instance. This can be observed from the EC2 service web console, and looks similar to this:


(click image to view full size)

Cause: This problem points out an issue with either the integration node or the queue manager. For this implementation, extensive health checking is adopted in order to ensure the high reliability and availability of the data. This is split into three different types of tests:

  • Health checking on the integration node: this checks that the integration node is running and verifies that the following processes are also running:
    • Bipbroker
    • Bipservice
    • Bip MQTT
    • DataFlowEngine
  • Health checking on the queue manager: this checks that the queue manager is running and verifies that the following processes are also running:
    • amqzxma0
    • amqzfuma
    • amqzmuc0
    • amqzmur0
    • amqzmuf0
    • amqzmgr0
    • amqfqpub
    • amqpcsea
    • amqfcxba
  • Ports health checking: this checks that the ports required to connect to MQ and IIB are listening: 1414 and 9443 for MQ and 4417 and 7800 for IIB.

Solution: The first step which should be taken to discover the point of failure is to check the contents of the healthchecks.log in the /HA/log folder. This can be obtained by attempting to establish a SSH connection to the IBM Integration Bus instance that is deployed to EC2, or through CloudWatch. That log outlines the reasons why the health checking fails on the instance and can be used to solve the specific problem related to the integration node/queue manager/ports.

Maximum number of VPCs has been reached

Problem description: By default, an AWS account has a maximum of 5 VPCs per region. When you are trying to create a new master stack (which will deploy the VPC stack as a nested stack) in a region where this maximum number of VPCs has been reached, you are going to encounter this error.

Solution: Either delete one of the existing VPCs or increase your account’s limit. You can find more information about this here. Alternatively, you can choose another AWS region, where there are sufficient resources to for this deployment.

Stack is in DELETE_FAILED state

Problem description: When trying to delete a stack, the AWS console shows the stack’s state as ‘DELETE_FAILED’

Cause: Some stacks fail to be deleted due to the fact that some nested resources are not being deleted.

Solution: From the bottom panel, choose ‘Events’ and find out which of the resources failed to delete. Then, find the corresponding resource (usually on EC2) and manually delete that resource. After that has been deleted, go back to CloudFormation and try to delete again the desired stack.

Packer problems on some RedHat Linux Distributions

Problem description: Trying to install packer can have the following error:

packer
/usr/share/cracklib/pw_dict.pwd: Permission denied
/usr/share/cracklib/pw_dict: Permission denied

Also, a Packer build (using a command similar to this: packer build -var aws_access_key=<your-access-key> -var aws_secret_key=<your-secret-key> iib-ami.template.json) might hang, before printing anything.

Cause: This happens because on some RedHat-based Linux distributions there is another tool named packer installed by default. You can read more about installing Packer on different distributions and this problem here.

Solution: You can create a symlink to packer that uses a different name like packer.io, or invoke the packer binary you want using its absolute path, e.g. /usr/local/packer.

Problems encountered while building an Amazon Machine Image

This article describes how to build an Amazon Machine Image deploying IBM Integration Bus, together with IBM MQ. However, you might encounter this common issue:

Bad Source Error

Problem description: The packer build uploads a series of scripts to the newly created image, but it cannot find one of the required scripts.

Cause: The script does not exist on the specified location on your local machine.

Solution: Ensure that you have copied over to your local machine all the files under the scripts and services directory existing here. Also, you need to follow the same directory structure as the one in the repository. Otherwise, the iib-ami.template.json file needs to be updated to reflect your distribution of the files on your computer.

Join The Discussion

Your email address will not be published. Required fields are marked *