PowerVC Dynamic Resource Optimizer (DRO)

Joe Cropper (jwcroppe@us.ibm.com)

Senior Software Engineer, PowerVC Software Development

Stephen W.

Software Developer, PowerVC Software Development

November 2015

Dynamic Resource Optimizer (DRO)
Today, Joe Cropper and I are going to explain the Dynamic Resource Optimizer, and how it can help make your life easier. The Dynamic Resource Optimizer (DRO) is a cutting-edge feature of PowerVC that brings an unprecedented level of automation to your Power cloud for PowerVM and PowerKVM hypervisors. When enabled, DRO monitors your compute nodes and automatically detects resource imbalances. Depending on its mode of operation, DRO will either advise or automatically perform actions to restore balance to your cloud. Using this technology allows cloud administrators to spend less time performing labor-intensive infrastructure monitoring tasks, and allows more time to focus their efforts on other critical business initiatives. Additionally, enterprises can achieve higher levels of ROI regarding their hardware as it can run increased workload densities. When workload spikes occur, DRO can quickly recognize the imbalance and rebalance the cloud before chaos unfolds.

Advise Only Mode vs. Active Mode
DRO supports two modes of execution:

  1. Advise Only—DRO monitors your cloud and when it determines that an action is needed, it recommends the action but does not perform it. ?
  2. Active—DRO monitors your cloud and when it determines that an action should be performed, it proceeds with the execution of the appropriate action. ?

The execution mode is specified at the host group level, so you can enjoy the flexibility of having some host groups running in advise mode (e.g., production environments), while others run in active mode (e.g., test environments). You can also exclude select compute nodes and virtual machines (VMs) from being optimized by DRO. For example, if you have a “mission critical” compute node (or VM) that you never want DRO to migrate, you can exclude it from optimization operations as shown in Figure 1 below: 

Figure 1: Exclude Host from DRO Dialog

These flexible capabilities provide cloud administrators a convenient way to ease into a fully automated data center.

DRO Actions
DRO can take two types of actions: virtual machine live migration, and mobile core activations via Power Enterprise Pools. The actions taken by the DRO depend on the options selected by users. Figure 2 is a screenshot of a host group being created where you can see options such as: “CPU utilization, stabilization, run interval, and maximum concurrent migrations.” You can choose to migrate virtual machines, activate mobile cores, or both. If you choose both and the host in need of attention is a member of an Enterprise Pool, the DRO first tries to activate one or more mobile cores. DRO tries to migrate a VM from a busy host to a less busy host. Please continue to read below for details on how live migration works.

Virtual Machine (VM) Live Migration
When a compute node exceeds its CPU utilization threshold, DRO can invoke live migration operations on resident VMs to restore balance to the cloud. In order for DRO to start live migrations, the same prerequisites apply as in the case of user-initiated live migrations (for example: shared storage and collocation rules must be honored, etc.). If VMs are not configured for live migration, DRO ignores them.

Under the Hood?
While the exact mechanics of how DRO works is rather involved, the following items provide notable details on DRO live migrations:

  1. DRO only initiates one live migration per host per optimization cycle (and it waits for the migration to complete before initiating another). The DRO purposely waits for the cloud to “stabilize” a bit so that it does not invoke live migrations too aggressively. ?
  2. When identifying migration targets, DRO employs some heuristics to predict the CPU utilization of target hosts post-migration to avoid the “ping pong” effect of a migration causing the target host’s threshold to also become exceeded. ?
  3. When determining which VM to migrate, DRO attempts to select the VM that makes the largest reduction in the source host’s CPU utilization. If there are multiple VMs that can predictably bring the CPU utilization below the threshold, DRO selects the VM with the smallest memory footprint. This minimizes the amount data that needs to be transferred across the network. ?
  4. If the live migration of a VM fails, DRO will not attempt to migrate that VM for at least another hour (i.e., do not repeatedly select a VM if there is a situation that requires manual intervention). ?
  5. On the virtual machine level, PowerVC samples the CPU utilization every 5 minutes. Since CPU utilization is required for the DRO algorithms on the VM level, it is strongly recommended that you do not set your ??????_???????????????? Ă— ?????????????????????????? duration to anything less than 10 minutes. Otherwise, DRO might make decisions based on stale data.  ?

Mobile core activations via Power Enterprise Pools
Power Enterprise Pools (PEPs) provide additional flexibility and value for PowerVM-based enterprise servers (Power 770, Power 780, Power 795, Power E870 and Power E880) by allowing you to purchase a pool of mobile cores or memory that can be dynamically (re)assigned to compute nodes as needed. DRO brings an entirely new level of automation to PEPs in that DRO can automatically assign cores to compute nodes as a means to resolve resource imbalances. This approach is far superior to performing live migrations, as cores can be activated instantaneously. With Power Enterprise Pools, there is no tax on your network infrastructure, and it is sometimes the only way to rebalance the environment (for example, if VMs cannot be migrated due to collocation rules, no shared storage, etc.). DRO coupled with PEPs allows for nearly instantaneous resolution to resource imbalances—an industry-leading solution that only the Power platform can provide. With today’s cloud, analytic, mobile and social workloads, the demands are highly variant and unpredictable—DRO can give you a competitive advantage in terms of meeting your workloads’ demands.

Pro Tip: If you only selected “mobile core” activations for a rebalancing operation, then you can lower the duration to less than 10 minutes if you want DRO to quickly detect and react to resource imbalances. Mobile core activations do not have nearly the amount of overhead that live migrations do.

In order for DRO to utilize PEPs, you must purchase PEP licenses separately. PowerVC and DRO automatically recognize any compute nodes that are members of a PEP. No special configuration within PowerVC is necessary.

Under the Hood
The following items provide additional details about support for DRO Power Enterprise Pools:

  1. If DRO has been configured to perform both live migration and mobile core activations; the latter will always be attempted first, as mobile core activations are always preferred over live migrations. ?
  2. When a compute node exceeds its threshold with mobile cores available in the enterprise pool, DRO assigns cores from the pool before trying to reclaim cores from other hosts within the pool. ?
  3. DRO will only reclaim cores from other hosts in the enterprise pool if the target host’s CPU utilization will not predictably go above the threshold (otherwise, this leads to a “ping pong” effect of mobile cores). ?
  4. If DRO deems that a host requires multiple cores to resolve its CPU utilization imbalance, it tries to reclaim cores from as many hosts in the enterprise pool as possible. This technique minimizes the overall impact to your cloud since it allows for individual hosts to not be taxed too much.
  5. Once assigned to a host, DRO only reclaims cores from a host if another host in the clouds requires them. It won’t proactively pull them back into the enterprise pool to sit “idle”. ?
  6. DRO makes every effort to never create “unreturned cores.” If a core is being used by a VM, DRO will not attempt to reclaim it. ?
  7. DRO can only reclaim cores from other systems that are (a) in the same enterprise pool, and (b) within the same PowerVC host group. Therefore, it is not recommended to split your enterprise pool hosts between multiple PowerVC host groups as you limit the number of rebalancing options available to DRO. ?

How Can I Start Using DRO?
While seemingly a complex technology, enabling DRO is user-friendly and very approachable. With a few mouse clicks, you can enable DRO and let it get to work! DRO continuously monitors your infrastructure’s host-level CPU utilization and takes action when it exceeds a threshold that you define. Other than these few clicks, there is nothing else you need to do: DRO just simply works! Let’s take a look at the DRO configuration panel, which is available from the host group’s “details” page:

Figure 2: DRO Configuration Panel

As you can see, enabling DRO is as simple as:

  1. Checking the box to enable DRO. ?
  2. Selecting in which mode (advise or active) you want DRO to run. ?
  3. Selecting which optimizations DRO can perform (live migration, mobile core ?activation, or both; the latter requires Power Enterprise Pool support—which can be purchased separately). ?
  4. Defining the runtime parameters of DRO (e.g., CPU utilization threshold). ?

Making these selections takes only a minute or two and DRO is activated; it’s that simple!

Summary
Today’s clouds are extremely demanding of their infrastructures. Cloud, analytic, mobile, and social workloads are highly variant and largely unpredictable in nature. As cloud administrators build their infrastructure, it is imperative that they have the proper tools to cope with various situations. PowerVC is positioned to help administrators build out even the most complex cloud infrastructures. If you have any questions about this or anything else, please post them in the comments, on our LinkedIn group, our Facebook page, or our Twitter account.  We’d love to hear from you!

Join The Discussion

Your email address will not be published. Required fields are marked *