Overview

Skill Level: Any Skill Level

Working knowledge on IBM¬ģ MQ & AWS Cloud Offerings

In this article, I am going to discuss about Architecting & Automating Messaging Solutions using IBM MQ by making use of frequently used AWS services like EC2, S3, NLB, EFS, Auto-Scaling Groups, CloudWatch etc.

Ingredients

IBM¬ģ MQ: Version 8.0 & Above

Platform: Red Hat Linux 

Cloud Infrastructure by its very origin, dynamic in nature wherein Cloud Engineers have the flexibility to provision resources dynamically based on the demand which is contrary to traditional Data Centre model. Automation also plays a key role as well. In this article, I am going to discuss on how to design & deploy IBM MQ solutions efficiently in AWS Cloud Infrastructure not only to interconnect applications running on the Cloud but also to connect back to on-premise Data Centre. The article also¬†demonstrates High Availability Solutions along with Resiliency provided by IBM¬ģ MQ¬†by¬†making use of frequently used AWS Cloud Services as shown in the below diagram.¬†MQ

As per the above architecture diagram, I have deployed 3 MQ instances using Red Hat AMIs, one in each Availability Zone (AZ). Each of the MQ instances are being abstracted from the Applications by placing a Network Load Balancer in front. This ensures that the Application is completely unaware of the actual MQ Servers details in the network and the client connections are managed by the NLB in a user transparent manner from an Application perspective.

S3 is used to store the MQ binary along with other configuration files which can be invoked dynamically using AWS CLI as part of automation process by passing scripts in User Data Section of the Launch Configuration.

The EC2 instance (created from the Launch Configuration) in each AZs is made part of the Auto Scaling Groups (ASG) with a Size 1 (Desired Size 1, Min Size 1 & Max Size 2) which basically ensures Resiliency in case of failure in the respective Zone. The ASGs are pointed to the Target Group attached to the Network Load Balancer (NLB).

From the storage perspective, I have allocated 3 Elastic File Systems (EFS) which has been mounted to each one of the EC2 instance in their respective availability zones (AZs). These EFSs are being used to store the IBM MQ's Logs & Data which gives the flexibility for a quick failover in case the instance goes down due planed or unplanned outages. The other important functionality of storing the MQ Logs & Data in EFS volume is to retain the Persistent Messages and Cluster State Information. The logic behind this has been discussed in detail in section 7.

Another dedicated EFS (/MQBackup) vol. has been created & mounted to all 3 MQ instances as shown in the diagram. Its objective is to back up the Queue Manager's Data & Logs files periodically. In a nutshell, while the MQ installation directory i.e. /opt/mqm are placed in the EBS volume of the local EC2 instance, the Queue Manager's Data & Logs (/MQHA) are stored in the EFS.

CloudWatch monitoring has been configured to monitor the EC2 instance's CPU performance, Memory utilizations etc. It also fetches the MQ Error Logs i.e. AMQERR0X.LOG from the MQ server to the AWS Console. Hence, MQ Administrators have the option to review the logs for troubleshooting without actually login to the server. Detailed setup has been discussed in section 11.

The article also demonstrates the extent to which IBM MQ's installation & configuration automation can be achieved in AWS Cloud & how much of manual effort is required while doing BAU support. I have also discussed about the MQ Patching / Upgrade strategies with minimal downtime required for MQ instances in each AZ while ensuring 100% Application Availability.

 

Step-by-step

  1. AWS Services used in the Solution Design

    The following AWS Services has been used extensively to achieve the below solution architecture using IBM MQ. These services are briefly discussed below.

    MQ

    1. VPC: Amazon Virtual Private Cloud (Amazon VPC) lets you provision a logically isolated section of the AWS Cloud where you can launch AWS resources in a virtual network that you define. You have complete control over your virtual networking environment, including selection of your own IP address range, creation of subnets, and configuration of route tables and network gateways. You can use both IPv4 and IPv6 in your VPC for secure and easy access to resources and applications.

    2. Availability Zone (AZ): Amazon EC2 is hosted in multiple locations world-wide. These locations are composed of regions and Availability Zones. Each region is a separate geographic area. Each region has multiple, isolated locations known as Availability Zones. Each Availability corresponds to each Subnet. Think of each AZ is a physically isolated Data Centres hosted by AWS.

    3. EC2: Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides secure, resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers.

    4. Auto Scaling Groups (ASG): An Auto Scaling group contains a collection of Amazon EC2 instances that share similar characteristics and are treated as a logical grouping for the purposes of instance scaling and management. For example, if a single application operates across multiple instances, you might want to increase the number of instances in that group to improve the performance of the application. Or, you can decrease the number of instances to reduce costs when demand is low. Use the Auto Scaling group to scale the number of instances automatically based on criteria that you specify. You could also maintain a fixed number of instances even if an instance becomes unhealthy. 

    5. Elastic File System (EFS): Amazon Elastic File System (Amazon EFS) automatically mounted on the IBM MQ server instance for distributed storage, to ensure high availability of the queue manager service and the message data. If the IBM MQ server fails in one Availability Zone, a new server is created in the second Availability Zone and connected to the existing data so no persistent messages are lost. Failover typically takes 3-5 minutes, but can be longer if there are outstanding transactions.

    6. Simple Storage Service (S3):¬†Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.¬†Amazon S3 is designed for 99.999999999% (11 9’s) of durability, and stores data for millions of applications for companies all around the world. In this illustration, I am going to use S3 to store the MQ binary along with few other configuration files which will be fetched during runtime while installing MQ.

    7. Network Load Balancer (NLB): Elastic Load Balancing supports the following types of load balancers: Application Load Balancers, Network Load Balancers, and Classic Load Balancers. A load balancer serves as the single point of contact for clients. The load balancer distributes incoming traffic across multiple targets, such as Amazon EC2 instances. This increases the availability of your application. You add one or more listeners to your load balancer. Applications are designed to connect to Queue Manager using NLB as per our design requirement.

    8. CloudWatch: Amazon CloudWatch is a monitoring and management service built for developers, system operators, site reliability engineers (SRE), and IT managers. CloudWatch provides you with data and actionable insights to monitor your applications, understand and respond to system-wide performance changes, optimize resource utilization, and get a unified view of operational health. In this solution, we are going use cloudwatch to fetch the MQ Error logs & display the same from the AWS Console so that we can investigate the issues even without access / login to the server.

    9. IAM Roles: AWS Identity and Access Management (IAM) enables you to manage access to AWS services and resources securely. Using IAM, you can create and manage AWS users and groups, and use permissions to allow and deny their access to AWS resources. 

    10. CloudFormation: AWS CloudFormation provides a common language for you to describe and provision all the infrastructure resources in your cloud environment. CloudFormation allows you to use a simple text file to model and provision, in an automated and secure manner, all the resources needed for your applications across all regions and accounts. This file serves as the single source of truth for your cloud environment. AWS CloudFormation is available at no additional charge, and you pay only for the AWS resources needed to run your applications.

  2. Auto-Scaling Groups & EC2 Instances

    An Auto Scaling Group (ASG) contains a collection of Amazon EC2 instances that share similar characteristics and are treated as a logical grouping for the purposes of instance scaling and management. The below two figures represents two different Use Cases of an ASG. Figure 1 represents an architecture where 3 different Queue Managers across 3 different AZs are connected to the NLB where each MQ instance is part of their own ASG. Whereas, Figure 2 represents 3 Queue Managers with same configuration are connected to a single ASG which then points to NLB. Basically, we can use any combinations to design a solution based on requirements.

                          ASG                   1-ASG

                                                   Figure (1)                                                                     Figure (2)

    In this technical recipe, I have used as many diagrams & screenshots as possible to illustrate & simplify the design concepts. Our first step in the overall Messaging solution is to provision the EC2 instances in each Availability Zone. This can be done by either by using a Launch Configuration to create the template for EC2 instances or you can create the EC2 instance first & subsequently make it part of the Auto-Scaling Groups which in turn creates the Launch Configuration automatically.

    While I am breaking down each step, this entire solution (AWS Resource allocation) can be automated using CloudFormation template. As shown below, I have created 3 EC2 instances (MQ1, MQ2 & MQ3) in each of the 3 Availability Zones with the Instance Type being used as “t2.medium“. However, in real Production environment, Instance Type recommended is either “t2.large” or higher depending on the workload & TPS required by the Application.

    3-EC2-instance-1

    Click on the Actions button & select Instance Settings -> Attach to Auto Scaling Group as shown below. 

    ASG-1

    Select ‘a new Auto Scaling group’ and give it a name based on the naming convention used in your environment. For this illustration, I have used MQ1 & clicked on the Attach button. This action will create the ASG & corresponding Launch Configuration by the same name i.e. MQ1. Similarly repeat the same actions for the remaining 2 MQ instances (MQ2 & MQ3).

    ASG-2

    The ASG by the name MQ1 has been created as shown below. I have mentioned the Desired Capacity as 1, Minimum Capacity as 1 & Max as 2. I will discuss in detail for this settings in the Patching Strategy section. Selecting Desired Capacity as 1 will ensure that at any given point, I will have one MQ instance running in each Availability Zone. 

    ASG-4

    The objective of having MQ instance running as part of ASG is to ensure that even if my MQ host goes down for some reason, it will spin up another MQ instance in the same Availability Zone using the Launch Configuration without any manual intervention thus providing Automation & Resiliency in the overall design architecture from MQ perspective. 

    ASG-5

  3. Allocate Elastic File Systems (EFS)

    Elastic File System (EFS) is the key to overall solution architecture of running IBM¬ģ MQ in¬†AWS Cloud. While we frequently come across SAN/NAS storage in a traditional Data Centre environment, EFS terminology has been coined by Amazon which provides similar functionalities.¬†These EFSs are being used to store the MQ’s Logs & Data which gives the flexibility for a quick failover in case the EC2 instance in a given Availability Zone (AZ) goes down due planed or unplanned outages. Apart from this, the other important functionality of keeping the MQ Logs & Data in EFS is to retain the Persistent Messages and Cluster State Information.¬†

    Essentially being a network storage, we can mount EFS to an EC2 instance running in any Availability Zones & start a previously configured Queue Manager with the MQ Logs & Data from the previous setup. In this way, we have the option to reuse an already existing MQ setup to join back an existing MQ Cluster without any manual intervention. This gives tremendous flexibility for a quick failover & fall back of MQ instances spanning across multiple Availability Zones.

    The below diagram depicts our use case scenario. 3 EFS used by 3¬†MQ instances (MQ1, MQ2, MQ3) spanning across AZs. There is another EFS (/MQBackup) vol. provisioned which is being used as a dedicated backup drive for all¬†3 Queue Manager’s Data & Log files.

    EFS

    Note: For a standalone MQ setup where a given Application does not inter connect with other Applications or Queue Managers in a network, EFS does not provide any additional value. For this kind of standalone architecture, EBS volume which comes as part of EC2 instance should sufficient to store MQ Data & Logs.

    The below screenshot shows 3 EFS for 3 Queue Managers (MQ1, MQ2, MQ3) has been provisioned along with a /MQBackup EFS.

    EFS-1A

    Each EFS vol. has its own DNS name and has been configured to be mountable from all 3 Availability Zones as highlighted below. This feature gives us the ability to mount & un-mount these file systems containing MQ Data & Logs from any EC2 instance running in any AZs for a quick failover & fall-back scenario. We are going to illustrate the same in the coming sections.

    EFS-2

  4. Mount EFS to EC2 Instances

    There are few pre-requisite steps required before we can mount an EFS volume to a running EC2 instance. I have mentioned a little snippet of original automation code relevant only to EFS mounting. The entire code along with its other configuration functionalities will be passed in the User Data Section of the Launch Configuration which automates most of the MQ Installation & configuration tasks.

    The first task here is to install the NFS Client on top of the EC2 instance. Next step involves creation of mount point directories for MQ Data & Logs under /MQHA. Backup directory (/MQBackup) has been created to take periodic backup of the /MQHA file system. Change the ownership of the mount point directories to mqm as mentioned below. We also need to update the entries in /etc/hosts & fstab files to ensure that EFS Filesystems get automatically mounted after system reboot or during provisioning of a new EC2 instance.

    /* Installation of NFS Client */
    sudo yum install -y nfs-utils

    /* Create the mount point directories */
    sudo mkdir /MQHA
    sudo chown -R mqm:mqm /MQHA
    sudo mkdir /MQBackup
    sudo chown -R mqm:mqm /MQBackup

    /* Making entires in the hosts & fstab files to auto-mount EFS Filesystems during reboot */
    sudo su - root -c 'echo "172.31.8.164 fs-4605847f.efs.ap-southeast-2.amazonaws.com fs-4605847f" >>
    /etc/hosts'
    sudo su - root -c 'echo "172.31.8.10 fs-aa109193.efs.ap-southeast-2.amazonaws.com fs-aa109193" >>
    /etc/hosts'
    sudo su - root -c 'echo "fs-4605847f:/ /MQHA nfs4 defaults,_netdev 0 0" >> /etc/fstab'
    sudo su - root -c 'echo "fs-aa109193:/ /MQBackup nfs4 defaults,_netdev 0 0" >> /etc/fstab'

    /* Mount the EFS filesystem to EC2 instance */
    sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport
    fs-4605847f.efs.ap-southeast-2.amazonaws.com:/ /MQHA
    sudo mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport
    fs-aa109193.efs.ap-southeast-2.amazonaws.com:/ /MQBackup

    /* Change the ownership of the File system to mqm user */
    sudo chown -R mqm:mqm /MQHA
    sudo chown -R mqm:mqm /MQBackup

    Note: 1. DNS of /MQHA Filesystem dedicate to MQ1 - fs-4605847f.efs.ap-southeast-2.amazonaws.com
    2. DNS of /MQBackup Filesystem for MQ1,MQ2 & MQ3 - fs-aa109193.efs.ap-southeast-2.amazonaws.com

     Mount Point IP Address of MQ1 in AZ-A Р172.31.8.164

    MQ1

    Mount Point IP Address of Shared Backup EFS in AZ-A – 172.31.8.10

    MQBackup

    Similarly, repeat the above steps for mounting the respective EFS to MQ2 & MQ3 instances in Availability Zone B & C respectively.

  5. Configure the pre-requisite OS Settings for IBM¬ģ MQ Installation

    In a traditional Data Centre environment, the following steps are generally performed by System Administrators or sometimes by MQ Admins depending on the level of access granted. These are related to fine tuning the Linux Kernel level settings as recommended by IBM as a pre-requisite for MQ installation for optimized performance. The technical details are mentioned in the link -> Click here! 

    However, in AWS Cloud…to automate this configuration process, we will pass on the following lines of code in the User Data Section of an EC2 instance or Launch Configuration so that when the system is being provisioned, the Red hat image will come up with pre-configured settings optimized for MQ runtime.

    /* Take backup of limits.conf file & update the nofile & nproc parameter with recommended values */
    sudo cp -p /etc/security/limits.conf /etc/security/limits.conf.bkp
    sudo su - root -c 'echo "mqm hard nofile 10240" >> /etc/security/limits.conf'
    sudo su - root -c 'echo "mqm soft nofile 10240" >> /etc/security/limits.conf'
    sudo su - root -c 'echo "mqm hard nproc 20480" >> /etc/security/limits.conf'
    sudo su - root -c 'echo "mqm soft nproc 20480" >> /etc/security/limits.conf'

    /* Take backup of sysctl.conf file & update the kernel parameters with recommended values */
    sudo cp -p /etc/sysctl.conf /etc/sysctl.conf.bkp
    sudo su - root -c 'echo "kernel.msgmni = 1024" >> /etc/sysctl.conf'
    sudo su - root -c 'echo "kernel.shmmni = 4096" >> /etc/sysctl.conf'
    sudo su - root -c 'echo "kernel.shmall = 2097152" >> /etc/sysctl.conf'
    sudo su - root -c 'echo "kernel.sem = 500 256000 250 1024" >> /etc/sysctl.conf'
    sudo su - root -c 'echo "kernel.pid_max = 120000" >> /etc/sysctl.conf'
    sudo su - root -c 'echo "kernel.threads-max = 48000" >> /etc/sysctl.conf'
    sudo su - root -c 'echo "fs.file-max = 524288" >> /etc/sysctl.conf'
    sudo su - root -c 'echo "net.ipv4.tcp_keepalive_time = 300" >> /etc/sysctl.conf'

    /* Refresh the service to reflect updated values */
    sudo sysctl -p
  6. Install MQ, Create & Start Queue Manager

    In the previous section, I have already prepared my EC2 instance to be ready for MQ installation. Now, it’s time to install IBM¬ģ MQ during system start-up. The following lines of code will be passed in the User Data Section of a Launch Configuration or of an EC2 instance to automate the installation process. The pre-requisite step here is to copy the MQ Binary file from S3 Bucket to local EC2 instance as mentioned in the architectural diagram.

    /* Copy the MQ Binary & misc.zip from S3 bucket to local EC2 instance */
    sudo su - root -c '/usr/bin/aws s3 cp s3://mq9.1/IBM_MQ_9.1_LINUX_X86-64.tar /home/ec2-user'
    sudo su - root -c '/usr/bin/aws s3 cp s3://mq9.1/misc.zip /MQHA'

    /* Extract, Accept License, perform the installation */
    tar -xvf IBM_MQ_9.1_LINUX_X86-64.tar
    sudo su - root -c "/home/ec2-user/MQServer/mqlicense.sh -accept"
    sudo rpm --prefix /opt/mqm/inst1 -ivh MQSeriesServer-9.1.0-0.x86_64.rpm
    MQSeriesRuntime-9.1.0-0.x86_64.rpm
    sudo su - root -c "/opt/mqm/inst1/bin/setmqinst -i -p /opt/mqm/inst1"

    Once the installation is completed, I am going to create & start the Queue Manager MQ1 using the script createqm1.sh which is being fetched from S3 bucket as part of automation process. After this step, you can have your MQSC scripts ready to build the Queue Manager Objects & set the necessary OAM permissions. 

    /* Create the log & qmgrs directory */
    sudo mkdir /MQHA/log
    sudo mkdir /MQHA/qmgrs
    sudo su - mqm -c '/MQHA/misc/createqm1.sh'

    /* createqm1.sh /*
    crtmqm -ld /MQHA/log -md /MQHA/qmgrs -lp 12 -ls 4 -lf 4096 -u SYSTEM.DEAD.LETTER.QUEUE MQ1
    strmqm MQ1

    /* Add mqseries script to stop/start MQ upon system reboot */
    sudo cp /MQHA/misc/mqseries /etc/init.d/
    sudo chkconfig --add mqseries
    sudo chkconfig mqseries on

    /* Add Cron jobs to backup /MQHA Filesystem to /MQBackup */
    sudo su - mqm -c '(crontab -l ; echo "0 2 * * 1 /MQHA/misc/backup > /dev/null 2>&1") | sort - |
    uniq - | crontab -'

    /* backup script */
    #!/bin/bash
    cp -Ru /MQHA /MQBackup/MQ1/"MQ1.$(date +"%Y-%m-%d-%H-%M-%S")

    This following section of the code is required once you have already configured a Queue Manager (MQ1) and stored its Logs & Data directories in /MQHA EFS volume. As part of the automation process, when you destroy an EC2 instance & spin up a new one with MQ installed on it, the following lines of code is required to associate the new installation of IBM MQ with Data & Logs of pre-configured Queue Manager (MQ1) mounted from the EFS volume. This technique is frequently used during patching and failover scenario.

    /* This snippet is passed in the User Data Section of the Launch Configuration */
    sudo su - mqm -c 'echo "DefaultQueueManager:" >> /var/mqm/mqs.ini'
    sudo su - mqm -c 'echo " Name=MQ1" >> /var/mqm/mqs.ini'
    sudo su - mqm -c 'echo "LogDefaults:" >> /var/mqm/mqs.ini'
    sudo su - mqm -c 'echo " LogDefaultPath=/MQHA/log" >> /var/mqm/mqs.ini'
    sudo su - mqm -c 'echo "QueueManager:" >> /var/mqm/mqs.ini'
    sudo su - mqm -c 'echo " Name=MQ1" >> /var/mqm/mqs.ini'
    sudo su - mqm -c 'echo " Prefix=/var/mqm" >> /var/mqm/mqs.ini'
    sudo su - mqm -c 'echo " Directory=MQ1" >> /var/mqm/mqs.ini'
    sudo su - mqm -c 'echo " DataPath=/MQHA/qmgrs/MQ1" >> /var/mqm/mqs.ini'
    sudo su - mqm -c "strmqm MQ1"
  7. Joining an existing MQ Cluster (On-Premise or Otherwise)

    Till now, whatever has been discussed is automated and can be achieved through the combination of Launch Configuration & copying pre-configured files from S3 bucket. But here comes the tricky part, where you may need some manual effort depending on the type of environment your Queue Manager will operate and your organization’s security standards. In most cases, all Production Queue Managers uses SSL Certificates in their MQ Channels either signed by Organization’s Internal CA or third party. Getting these SSL/TLS certificate signed/renewed is something which can’t be automated in most companies. While it’s not technically impossible, most environments simply do not have the required infrastructure to support this automation. Once this manual task of certificate renewal is completed, the next action is to make the newly created Queue Manager (MQ1) running in AWS Cloud to join an existing MQ Cluster whose Full Repository Queue Managers are in a different network (let’s say On-Premise Data Centre).

    As long as the Firewall connectivity is open between your AWS network & On-Premise Data Centre, defining a Cluster Sender Channel to any one of the FULL Repos Queue Manager & and its corresponding Cluster Receiver Channel should enable MQ1 to join the cluster (there may be some channel Auth configurations required at the Full Repos Queue Managers depending on the security settings of your environment). You may also need some manual actions if there are issues while joining the cluster.

    How to bridge the gap between Dynamics Nature of Cloud MQ Instance & Static On-Premise MQ ?

    Applications running in Cloud are expected to be dynamic in nature which means it should be able to scale up or down based on demand. IBM¬ģ¬†MQ on the other hand, requires static configurations specially when it is operating in a MQ Cluster environment. Hence, one of the key challenge here is to make MQ flexible enough to run in a Cloud Infrastructure while maintaining its state information about the MQ Cluster. This can be achieved by creating the MQ logs & Data directories in a separate EFS which can be mounted to any EC2 instance dynamically at runtime. What this essentially means, after the initial configuration of a given Queue Manager (i.e. SSL Certificate renewal, Joining MQ Cluster, etc.) is done, I can destroy & recreate the MQ instance any number of times and still recover the same with complete automation thus adopting the very nature of Cloud.

    The below diagram represents the overall picture of IBM MQ solutions in Cloud & On-premise connected by VPG. One of the key factor here is the network link speed between AWS & On-Premise Applications. Hence, extensive testing is recommended to validate the application throughput (TPS) requirement in a hybrid cloud model.

    MQ-Design1How will the Full Repos Queue Manager (On-Premise) connect back to newly created EC2 instance once the old image is destroyed & recreated with new IP & hostname ?

    The other design challenge for MQ solutions in the Cloud is how you handle the ever-changing hostname & IP Address of the EC2 instances. While there are few techniques available to abstract the actual hostname of an EC2 instance, notable among them are either by using a NLB in front of the EC2 instance or by creating an alias on top of the actual server.  In a point to point messaging paradigm, these two are the most viable solutions.

    However, in case of a MQ Cluster implementation – we can exploit one of the inherent characteristic of MQ Cluster to adopt the dynamic nature of Cloud. This is done by keeping the Cluster Receiver Channel’s Connection parameter as Blank. IBM MQ by design, copies the connectivity information of a remote Queue Manager from its Cluster Receiver Channel. By keeping the Cluster Receiver Connection Parameter as Blank we are essentially creating a namespace which will be dynamically replaced by the actual IP Address of the EC2 instance every time we destroy & create a new one. This solves are fixed naming constraint issue in the MQ Channel connection parameter.

    /* Sample Cluster Receiver Definition of AWS MQ */
    DEFINE CHANNEL('CHANNEL-NAME') CHLTYPE(CLUSRCVR) CLUSTER('CLUSTER-NAME') CONNAME(' ')

    /* Sample Cluster Sender Definition to On-Prem FULL REPOS QM */
    DEFINE CHANNEL('CHANNEL-NAME') CHLTYPE(CLUSSDR) CLUSTER('CLUSTER-NAME')
    CONNAME('hostnameFullReposQM(1414)')
  8. Configure Security Groups (SG) & Network Access Control Lists (NACL)

    Security Groups (SG) & Network ACLs (NACL) acts as the firewall at an Instance & Subnet level respectively for all Computing rescources running in AWS Cloud. The below image from Amazon shows the role of each in a VPC.

    Sg-NACLSG controls both inbound and outbound traffic at the instance level & is stateful in nature. When you launch an instance in a VPC, you can associate one or more security groups that you’ve created. Each instance in your VPC could belong to a different set of security groups. If you don’t specify a security group when you launch an instance, the instance automatically belongs to the¬† default security group for the VPC. In our case, I have created a dedicated SG for MQ & named it as ‘IBM MQ Security Group’. The inbound traffic flow rules are shown in the below screenshot. Since, SGs are stateful in nature, all Inbound Rules are also applicable to Outbound rules which is not the case for Netwok ACLs.

    SG-1

    A network access control list (ACL) is an optional layer of security for your VPC that acts as a firewall for controlling traffic in and out of one or more subnets. For our illustration, I have created a Network ACLs (IBM-NACL) as shown in the below screenshot.

    NACL-1

    Since, NACLs are stateless in nature, we have to explicitly create the Inbound & corresponding Outbound rules if we need bidirectional connectivity between our subnets to the outside world or for connectivity to on-premise Data Centre.

    NACL-2The following screenshot shows the 3 subnets associated with the NACL. Since, it operates at the subnet level, as a MQ Administrator you may or may not have access to configure the NACLs which is generally controlled by the AWS Operations team for your organization.

    NACL-3

  9. Provision a NAT Gateway

    From solution design perspective, all IBM MQ instances should be placed in Tier 2-layer (Private subnet) of your VPC with no direct access to Internet. However, to make system/software updates available from Internet, we need to find a route from Tier 2 layer servers to the outside world. NAT Gateways is the solution to support this design architecture. The below diagram illustrates the architecture of a VPC with a NAT gateway in public & IBM MQ Instances in private subnets.

    NAT

    The main route table sends internet traffic from the MQ instances in the private subnet to the NAT gateway. The NAT gateway sends the traffic to the internet gateway using the NAT gateway’s Elastic IP address as the source IP address. If you have resources in multiple Availability Zones and they share one NAT gateway, in the event that the NAT gateway’s Availability Zone is down, resources in the other Availability Zones lose internet access. 

    Hence, to create an Availability Zone-independent architecture, create a NAT gateway in each Availability Zone and configure your routing to ensure that resources use the NAT gateway in the same Availability Zone.

  10. Provision the Network Load Balancer (NLB)

    IBM MQ works on top of layer 4 protocol i.e. TCP, hence, the default choice of Elastic Load Balancer should be Network Load Balancer (NLB). Apart from load balancing the incoming connection requests from Application to Queue Managers, NLB provides a level of abstraction which hides the actual hostname of the Queue Managers running behind the Load Balancer. With this design, the application is completely unaware of the number of IBM MQ instances it connects to and their geographical locations in a given region. This design configuration also gives MQ Administrators the flexibility while patching the Operating System or IBM MQ software, details of which is mentioned in the OS / MQ Patching strategies.

    NLB-DD

    As discussed in section 2 (Auto-Scaling Groups & EC2 Instances) of this article, either we can create EC2 instances in each zone using Launch Configuration or by provisioning the EC2 instances first & add the same as part of the Auto-Scaling Groups. Next step would be to create the Target Group which is kind of a logical component of a NLB as shown in the above diagram. The Auto-Scaling Groups (ASG) i.e. the EC2 instances which are part of the ASG are in turn registered with the target Group. This setup will ensure that even if we destroy an MQ instance in a given Availability Zone, the ASG will trigger another EC2 instance to spin up & get itself registered with the Target group upon start-up automatically.

    The below diagram shows that a target Group MQ-TG has been created which is part of the Load Balancer MQ-NLB.

    MQ-TG

    3 MQ instances (MQ1, MQ2 & MQ3) are running on TCP port 1414 across 3 AZs are showing in healthy status as shown in the below screenshot.

    MQ-TG-1

    Next, we have created the Network Load Balancer (MQ-NLB) whose DNS host name is highlighted in yellow. AWS also provides the option to customize the DNS name of the Load Balancer using one of its offered service Route53. The other important parameter to be noted here is the Cross-Zone Load Balancing is Enabled which essentially means that traffic diversion has been configured to all 3 MQ instances running across 3 Availability Zones.

    NLB

    Finally, the Listeners tab of the NLB shows the MQ-TG target group.

    MQ-TG-2

  11. Configure CloudWatch

    CloudWatch collects monitoring and operational data in the form of logs, metrics, and events, providing you with a unified view of AWS resources, applications and services that run on AWS, and on-premises servers. In this illustration, I am going to use CloudWatch to monitor the health of the EC2 instances. From MQ perspective, we can configure CloudWatch to fetch the MQ Error Logs as Log Stream from the EC2 instances to AWS Console for analysis. The benefit of doing this activity is that MQ Support Engineers will now have the ability to perform MQ troubleshooting even without even login to the actual server.

    Having said that, you still need other specialized monitoring tools like Tivoli or AppDynamics or some customized Programs to monitor the status of IBM MQ at a granular level. The following steps shows the configuration of CloudWatch to view the MQ Error logs from AWS console.

    First Step: Authorize your EC2 instance for CloudWatch.

    Configure your EC2 instances & authorized it to talk to the CloudWatch service by creating a policy in the Identity and Access Management (IAM) service, and then assigning that policy to a role. Attach the same role to EC2 instance.

    /* Sample IAM Policy */
    {
    "Version": "2019-01-15",
    "Statement": [
    {
    "Effect": "Allow",
    "Action": [
    "logs:CreateLogGroup",
    "logs:CreateLogStream",
    "logs:PutLogEvents",
    "logs:DescribeLogStreams"
    ],
    "Resource": [
    "arn:aws:logs:*:*:*"
    ]}
    ]}

    Second Step: Download & Install the CloudWatch agent file on the EC2 instance.

    /* Download & Install the CloudWatch Agent file */
    wget https://s3.amazonaws.com/amazoncloudwatch-agent/redhat/amd64/latest/amazon-cloudwatch-agent.rpm
    sudo rpm -ivh amazon-cloudwatch-agent.rpm

    Third Step: Sending error logs to CloudWatch.

    Copy the pre-configured configuration file i.e. mq.conf into the respective CloudWatch directory (/var/awslogs/etc/config) to fetch the MQ Error Logs as Log Steam in AWS Console and restart the log agent.

    /* Copy the preconfigured mq.conf file & restart the agent */
    sudo cp /MQHA/misc/mq.conf /var/awslogs/etc/config
    sudo service awslogs restart

    /* Snippet of the mq.conf file */
    [/MQ1/var/mqm/errors/AMQERR01.LOG]datetime_format = %m/%d/%Y %I:%M:%S %p
    file = /var/mqm/errors/AMQERR01.LOG
    buffer_duration = 5000
    log_stream_name = {instance_id}
    initial_position = start_of_file
    log_group_name = /MQ1/var/mqm/errors/AMQERR01.LOG
    multi_line_start_pattern = {datetime_format}

    [/MQHA/qmgrs/MQ1/errors/AMQERR01.LOG]datetime_format = %m/%d/%Y %I:%M:%S %p
    file = /MQHA/qmgrs/MQ1/errors/AMQERR01.LOG
    buffer_duration = 5000
    log_stream_name = {instance_id}
    initial_position = start_of_file
    log_group_name = /MQHA/qmgrs/MQ1/errors/AMQERR01.LOG
    multi_line_start_pattern = {datetime_format}

    /* Execute the cloudwatch.sh script from User Data Section of an EC2 instance */
    sudo su – root -c ‘/MQHA/misc/cloudwatch.sh’

    /* cloudwatch.sh */
    #!/bin/bash
    wget https://s3.amazonaws.com/amazoncloudwatch-agent/redhat/amd64/latest/amazon-cloudwatch-agent.rpm
    sudo rpm -ivh amazon-cloudwatch-agent.rpm
    sudo cp /MQHA/misc/mq.conf /var/awslogs/etc/config
    sudo service awslogs restart

    The below screenshot shows the output of MQ Error logs published in AWS CloudWatch console fetched as log-stream. It looks very much in the same format as we see in the actual servers during troubleshooting issues. With this CloudWatch featured enabled, MQ administrators now have the option to troubleshoot any MQ issues without actually login to the server. This feature opens the door for further security & regulatory compliance guidelines and access to MQ instances can be given to support personal on a need basis.

    Cloudwatch

  12. Multi-Instance Queue Manager (MIQM) vs dynamic Cloud Infrastructure

    Multi-Instance Queue Manager or MIQM configuration has been IBM¬ģ MQ’s answer to High Availability & Resiliency requirement at a software level. This feature (introduced from MQ v7.0.1) is still being used by many organizations having HA requirement for their Applications. The NFS mount is being used by 2 or more machines to share the MQ Data & Logs file systems with only one holding a write lock at any given point. The other server(s) will continue to run in a standby mode. In most cases, it has been proven that the actual failover over time between Active to Standby server is typically less than a minute. This works very well in a traditional Data Centre model where all the Infrastructure assets are fixed & cannot be dynamically allocated based on demand.

    However, running IBM MQ in Cloud is a different ball game altogether where you have the flexibility to spin up a new MQ instance on-demand & still make it part of your existing MQ network in a matter of hours with 90% automation in place or recovering an existing Queue Manager with 100% automation in less than 5 minutes ! In this scenario, business applications have to decide how much downtime (scheduled or otherwise) can they tolerate to get IBM MQ back online.

    The below diagrams show the MIQM configuration of MQ1 Queue Manager whose logs & data are stored in shared EFS mounted in 2 Availability Zones A & B.

    MIQM

    MIQM-1Now, let’s briefly touch the failure / crash scenarios & subsequent recovery of IBM MQ which could either be the Server down situation or MQ itself crashed for some reason.

    In case of an outage (Server or IBM MQ crash), the failover to secondary node is automated & takes less than a minute using IBM MQ’s MIQM configuration in both traditional Data Centre as well as in Cloud model. While the server crash recovery in a Cloud environment can be automated by spinning up a new Instance using Auto-Scaling Group which takes somewhere between 4 to 6 minutes depending on the type of Queue Manager configuration, the same recovery procedure in traditional data centre takes time and is not possible without manual intervention.

    Clearly, MIQM configuration has recovery time & automation advantage on its side, however, that comes with additional licensing cost of running another instance of MQ in standby mode. Hence, the architecture of a given use case is¬†totally dependent on the Application’s downtime acceptance criteria & budget for the solution.

    Lastly, in case of only IBM MQ crash scenario, while MIQM configuration will do an automatic failover to standby Queue Manager which typically takes less than a minute to recover, a standard mode configuration i.e. without MIQM, manual intervention is required to recover the queue manager.¬† In short, while you can easily recover a crashed EC2 instance with IBM MQ running on it in AWS Cloud, we simply can’t recover a crashed Queue Manager (with server up) automatically without MIQM configuration even in AWS or any other Cloud platforms for that matter.

  13. OS / MQ Patching Strategies

    In a traditional Data Centre environment, the typical patching (OS or MQ) / application downtime ranges from 30 to 45 minutes depending on the environment. However, in a Cloud model we can reduce this downtime from 1 to 3 minutes depending on the type of MQ setup you have i.e. Multi-Instance Queue Manager (MIQM) with EFS storage or a standard setup with EFS. In this section, I am going to discuss about the how we can achieve this minimal downtime by following the steps in sequence as mentioned below.

    One of the key aspect about this patching strategy is to spin up a new EC2 instance with latest OS release or IBM MQ Fix pack level and destroy the old image. This will ensure the minimal Application down time and the estimated 1-3 minutes is actually taken during the switch between old & new OS/MQ patched EC2 instances. Let’s assume we are going patch MQ1 instance running in AZ-A.

    The sequence of patching steps goes like this:

    1. Create an AMI with based version of IBM MQ installed on top of a Red Hat image.
    2. Copy the latest MQ Fix pack to the S3 bucket.
    3. Create a New Launch Configuration by copying the existing one & add the extra lines of code to copy & install the latest IBM MQ Fix pack (from S3) on top of the base IBM MQ AMI which was defined in your initial Launch Configuration.
    4. Update the existing Auto-Scaling Group (ASG) to point the latest version of the Launch Configuration created in the previous step. The ASG’s default termination policy should be changed to OldestInstance if not configured previously.
    5. Delete the Old Launch Configuration (optional). It’s important to maintain the Launch Configuration versioning if you plan to retain the old Launch Configurations.
    6. Change the Desired Size of the Auto Scaling Group to 2 which will trigger creation of another EC2 instance in the same Availability zone i.e. A. However, when the new instance came up by installing base IBM MQ along with its latest fix pack level, it won’t be able to start the Queue Manager MQ1 since the old EC2 with MQ1 is holding a lock in the /MQHA EFS which is expected by design.
    7. At this stage, we have 2 EC2 instances one with older version of MQ and another with latest version of MQ Fix pack.
    8. Suspend the existing Queue Manager MQ1 (running in old setup) from the Cluster(s).
    9. Stop the Queue Manager MQ1 which will release the lock on the EFS (/MQHA filesystem for MQ1) & unmount the EFS.
    10. Start MQ1 on the newly created EC2 instance with latest fix level.
    11. Resume MQ1 into the cluster(s).
    12. Change the Desired Size of the ASG to 1 (revert back the step 6) which will automatically destroy the oldest EC2 instance as per the termination policy. 

    Patching

    The above figure shows the switch between old & new Launch Configuration files viz v1.0 & v1.1 representing different Fix pack level of IBM MQ.

    So, the actual downtime for the Application starts from step 8 to 11 which typically takes between 1-3 minutes. However, as per the original design, there won’t be any Application downtime in realty since business traffic is being handled by 2 other Queue Managers i.e. MQ2, MQ3 in other AZs while the MQ1 is being patched. This is possible by placing the¬†NLB between the Application & MQ instances which abstracts the real MQ servers from the Applications.

    Note: With this patching strategy, the downtime of a given Application has been reduced to less than 3 minutes, however, I can think of a scenario wherein there could be a slim chance of losing in-flight messages while the Application connections are made through NLB in a request/reply messaging paradigm. Hence, this design has to be tested thoroughly before implementing the same in Production environment!

    The other way of mitigating this issue with certainty is to remove the EC2 (MQ) instance from the Target Group (MQ-TG) after step 8 which basically drains the existing connection to the MQ instance gracefully thus guarantees no message loss scenario. However, this step (if necessary) will add up another 2-3 minutes in the overall MQ downtime depending on the number of connections it has to drain.

  14. Overall Automation Achieved

    Automation is a buzz word in today’s IT world. With the surging popularity of the Cloud services, automation will play an even bigger role for all technologies running on cloud platforms. IBM MQ couldn’t afford to be an exception to this trend. In this article, I have discussed what & how can we automate installation & configuration tasks in IBM MQ. There are snippets of code mentioned in various sections of this article to demonstrate the automation tasks as mentioned below.

    1. Provisioning of an EC2 instance.
    2. Create Application Users & Groups.
    3. Update & fine tune the OS / Kernal Parameters required for IBM MQ v9.1.
    4. Mount the pre-allocated EFS File Systems (/MQHA & /MQBackup) by making entries in fstab & hosts.
    5. Copy the MQ Binary from S3 bucket (pre-configured) using AWS CLI.
    6. Install IBM MQ v9.1.
    7. Copy scripts & config files (if any) from S3 to local machine using AWS CLI.
    8. Create a Queue Manager & its objects.
    9. Set the OAM & Channel Auths using mqsc scripts.
    10. Start & Resume the Queue Manager in the Cluster(s). ** (Resume can only happen after Initial Configuration)
    11. Install & configure the CloudWatch agent and restart the process.
    12. Create the Crontab entries to take backup of /MQHA (EFS) to shared /MQBackup (EFS) periodically.
    13. Setup /etc/init.d/ for automatic stop/start of IBM MQ with system reboot.

     

    These 13 steps basically cover 100% automation while recovering an existing Queue Manager (from a server crash scenario) and make it run in the cluster again. However, we cannot recover a crashed Queue Manager without manual intervention unless we are using MIQM setup (which does it automatically).

    From an automation perspective, we can configure up to 90% of the tasks while provisioning a Queue Manager for the first time & making it part of an existing MQ network (either Clustering or P2P). The remaining 10% manual tasks involves setting up the SSL Certificate & Channel Auth Configuration steps (if any) at the Full Repository Queue Manager end (which could be on-Premise).

    However, from a BAU support perspective, MQ Administrators still need access to the MQ instances to manually Suspend/Resume Queue Managers from Clusters, Reset/Resolve MQ Channels, Create/Delete Queues, Change Queue Depths, alter some of the Queue Manager Configuration parameters etc. These tasks are mostly application and situation driven which requires human decision making & hence can’t be automated no matter how much we want them to be!

  15. Conclusion

    With the ever-increasing adoption of Cloud Platforms by Enterprises along with the gaining popularity of Open Source Products, traditional enterprise softwares have taken a hit without any shadow of doubt. IBM¬ģ MQ is not an exception to this. However, from a messaging perspective, no real product has evolved to challenge the dominance of MQ as a fast,¬†reliable & flexible messaging product with unmatched built-in security features.

    With the general acceptance of the Cloud Infrastructure by Enterprises, IBM MQ also must evolve & adopt the dynamic nature of Cloud not only to connect next generation of applications running in the Cloud but also to bridge the connectivity between the Application running in traditional Data Centre (including Mainframe) and the Cloud Apps. 

    Through this article, I have highlighted some of the key features available in AWS Cloud by making use of which IBM¬ģ MQ becomes an even more dynamic¬†product in the Messaging & Integration space in Clouds. As demonstrated in the article, around 85% to 90% of the actual Installation & Configuration tasks has been automated & the remaining 10 to 15% of manual work is required for BAU support. The Patching downtime for IBM MQ has been cut from 30 to 45 minutes in traditional Data Centre model to less than 5 minutes in Cloud platforms. It gives us the flexibility & reduces time to deliver MQ solutions in hours which otherwise would have taken weeks to deliver.¬†Using the concepts discussed in this article, majority of the MQ Messaging Solutions can be achieved or can be used as a baseline to design multiple solutions based on specific Application requirements.

    In a nutshell, if Cloud is the future for Enterprise Applications, rest assured IBM¬ģ MQ is ready to go!

  16. References & Related Topics

    In this article, I have focussed more on the AWS Services & how they can enable implementation of¬†IBM¬ģ MQ in its Cloud platform. For technical capabilities of IBM¬ģ MQ, please refer my other tech. articles as mentioned below.

    1.¬†IBM¬ģ MQ, an Enterprise Messaging Backbone in a True Sense!

    2. Working with IBM¬ģ MQ Managed File Transfer (MQMFT)

    3. Configuration of Multiple Certificates per Qmgr using IBM¬ģ MQ v8.0

    4. End to End Message Security using IBM¬ģ MQ

    5. Advanced Clustering Techniques using IBM¬ģ MQ

1 comment on"Architecting Messaging Solutions in AWS Cloud using IBM¬ģ MQ"

  1. Vance Heredia May 03, 2019

    Hi,

    Can I confirm if you can configure MIQM in conjunction with EFS? What are the performance considerations?

    VH

Join The Discussion