The Blog

 

Using managed service architecture to reduce time when moving to other cloud platforms

Our engineering team is happy to announce the launch of the IBM Db2 Warehouse on Cloud Flex offerings for Amazon Web Services (AWS). Two Flex models are included in this new offering: Flex and Flex Performance. The core of IBM Db2 Warehouse on Cloud is BLU acceleration, IBM’s in-memory columnar processing technology, actionable compression and data skipping techniques. The Flex and Flex performance offerings use Massively Parallel Processing (MPP) architectures–where data is spread across multiple partitions and acted upon by multiple resources in parallel. In addition, the same feature set in IBM Cloud, including independent compute and storage scaling, daily backups and high availability characteristics also exist in the AWS offering.

This technical blog talks about how our managed service architecture lends itself to multiple cloud platforms easily, enabling us to move to other cloud platforms in a short span of time. Similar foundation blocks are used in the IBM Cloud Flex offerings.

The Flex offerings are built on Kubernetes as the foundation. The base is an Amazon Elastic Container Service for Kubernetes (AKS) cluster on the AWS Platform. In this Kubernetes cluster, we provision customer Flex clusters. A single Kubernetes cluster will have multiple Flex clusters. The compute worker nodes that we use are Broadwell based, specifically, r4-16xlarge (64 vCPUs , 488 GB RAM and 20Gbps of N/W bandwidth) for Flex Performance and r4-8xlarge (32 vCPUs, 244 GB RAM and 10 Gbps of N/W bandwidth) for Flex clusters.

When a provisioning request comes in, a cluster configuration is determined based on the customer compute level. Worker nodes are picked from a preexisting pool or provisioned dynamically depending on the request. Storage volumes are dynamically provisioned or picked up from a pool as applicable. A cluster deployment process is completely automated, from the point the customer clicks on “Create” instance in the IBM Cloud Marketplace to the point the customer gets the welcome email. What you effectively get is a Flex cluster (see diagram below) with a bunch of worker nodes constituting your cluster, dedicated storage volumes, and a hostname to connect with. The only ports open in a cluster are the ones necessary to run different applications. Each cluster gets its own set of worker nodes and dedicated storage volumes.

For warehouse storage, we use provisioned IOPS SSDs (io1) elastic block storage for user data, temporary and archive space, and elastic file storage for file systems that need to be shared. Different IOPs tiers are used depending on the offering and performance needs. User data and temporary data use higher IOPs settings per volume. The EBS and EFS volumes are both encrypted for all data at rest and also provide data encryption in transit. In addition to storage level encryption, the Db2 Warehouse engine also provides database level encryption.

The high availability model on the system is multi-prong. First, any pod or node failure is immediately handled by Kubernetes. The EC2 instance spare pool that we maintain allows Kubernetes to move to a new node and get the Flex cluster up and running at the same capacity as before. Node downtime recoveries are in the order of minutes. We also continuously monitor things at a process level internally to ensure all microservice components are running without any issues. Any issue with any microservice components is identified and rectified by our internal high availability layer. Storage resiliency is provided by the EBS volumes themselves. They’re replicated by default across multiple servers in an availability zone to prevent the loss of data from the failure of any single component.

AWS image

For compute scaling on AWS, we create a new deployment template based on the customer scale (up / down), shut the old configuration down, and restart with the new deployment template. There’s no data redistribution on compute scaling. data volumes are moved from one pod/container to another. The new computes get their own volumes for temporary space and archive space, and the engine is restarted with the new configuration. Scale happens within minutes.

Storage scaling is completely online. A customer can pick up the new storage size, and the expansion process kicks in immediately. The full storage availability is dependent on the platform. AWS EBS volume scaling goes through state transitions before the storage is entirely usable. Also, AWS storage expansions cannot be applied twice within a 6 hour period (EBS limitation). Scaling of compute and storage can be done using REST APIs or by going into the IBM Cloud Console.

Backups occur daily by default. The backup is actually a storage level snapshot and gets copied out to Amazon Simple Storage Service (S3). The whole backup process takes less than a few minutes. Customers can use the IBM Db2 Warehouse on Cloud Console to modify the backup times and restore to a backup that was taken earlier. The restore process will restore the snapshot that was picked and start the Warehouse on cloud with that level of snapshot. Restore times are again in the order of minutes. The offering comes with a default 7 day backup limit–with the ability to tailor this to any number of backups.

Multiple tiers of security are enforced for the offering. In AWS, we’ve used all integral components of the platform to secure customer deployments, VPC (everything is protected by the VPC boundary), security groups at multiple levels (EC2 instances, VPC), and Kubernetes level Network Policies and RBAC rules.

To learn more, visit our web site.