co-authored with Kurt Taylor (krtaylor) @kurtrtaylor

Introduction

The upstream OpenStack Continuous Integration (CI) system was designed to be used by the project and to also be re-usable by third party vendors looking to test OpenStack with their components or hardware systems. The upstream CI system has been maintained and enhanced by the OpenStack Infrastructure team. Some of the components are shown below.

CI Diagram
Diagram showing components of CI

At a very high level, the developer contributes a patchset to gerrit. The CI test system gets an event from gerrit and queues that event with Zuul to be used by Jenkins, which is responsible for running the test jobs. A pool of VMs are kept ready to use by Nodepool. Tests are normally run in a VM setup with OpenStack using DevStack against a second-level virtual machine using the OpenStack test framework Tempest. When the tests complete, the status (pass or fail) is reported back to Gerrit with some other information about the test system along with all the test artifacts.

Testing for IBM Power scale-out systems is done by the KVM on Power CI system. This system uses the upstream CI test infrastructure, modified slightly to work behind the IBM firewall and to be able to use the Power hardware.

Architecture

Ironic is among the projects we test. Let’s take a look at the flow of our Ironic job.

Ironic Job Flow
The flow of the Ironic job in KVM for Power OpenStack CI

The Ironic job begins to differ from our other CI jobs after we receive a VM for testing, also known as a DevStack VM (dsvm). Jenkins starts the job on the dsvm, which then configures itself to run the tests. As you may have noticed, one of the first things our dsvm does is call a service named ‘molteniron’.

Our Ironic job is unique from the upstream Ironic job. For Ironic CI the main test is to deploy a target node. Currently this test relies heavily on VMs to serve as target nodes. In order for Ironic to deploy to these VMs, a different driver (the pxe_ssh driver) is used. While this is fast, it is not a true use case for Ironic. As well, it tests a driver that is not intended to be used in a production environment. Because of this, we decided that we want to test Ironic using actual POWER hardware as target nodes in place of these VMs.

We quickly realized that if we want to test with real hardware as targets, we’d need to be able to manage this hardware among several guests. For example, let’s say there are two guests running the Ironic tests. If they both tried to deploy to the same target node, they would interrupt each other and eventually cause each other’s test to fail. This led us to create a tool we call MoltenIron.

MoltenIron manages a pool of baremetal machines that are intended to serve as target nodes for Ironic testing. A dsvm can request to checkout a node from the pool though an HTTP POST request, which we send via a simple python client that we include in the tree. Each node’s information is stored in a relational database, and requests/responses are sent in JSON format. Upon checking the node out, MoltenIron will mark the node as in use and return any required information for testing to the requester. Once checked out, the node will not be offered to any other requester.

After the dsvm receives the target node information, it amends the DevStack configuration file with the received info allowing DevStack to properly register the target node with Ironic during its setup. Next we run stack.sh, which is the DevStack setup script. Upon completion, stack.sh will leave us with an OpenStack deployment, which in this case is suited for ironic testing.

Tempest, the OpenStack testing suite, then runs its set of Ironic tests. Currently, this include a set of API tests and the deployment test mentioned earlier. The deployment test runs as follows.

Ironic Deploy Flow
The flow of an Ironic deployment during testing

First Ironic powers on the node. Images to drive the deployment, which we call deploy images, are then discovered and requested via PXE. The images, a deploy ramdisk and deploy kernel, are sent via TFTP. The target node then boots into this deploy ramdisk image. The deploy ramdisk then exposes a disk on the target node via iSCSI. Next the deploy ramdisk requests that Ironic send it an operating system image, which we refer to as the target image, over this iSCSI connection. Once the image is on the disk, Ironic reboots the system. Once again images are discovered and requested, only this time they are the kernel and ramdisk used for booting the target image. The target node then boots into its shiny new operating system. Ironic will then verify that the OS is up and running via SSH. Finally, Ironic cleans the node. The cleaning process is identical to booting into the deploy ramdisk, except that this time the deploy ramdisk will wipe the disk instead of getting an OS on it. Once the node is cleaned, Ironic powers it off and the test is complete. Please understand that for the sake of brevity, this is a very oversimplified explanation of an ironic deployment.

Once testing is finished, the current owner of the node releases the the node back to MoltenIron’s pool, marking it as available for other testing instances to use.

Improving MoltenIron

The MoltenIron service is key to our Ironic CI job, so we are committed to improving it. For example, we want to make node allocation from MoltenIron happen as late as possible. Currently, we allocate a target node from MoltenIron in our pre-test hook, which is a script we run soon after getting a VM from nodepool but before running devstack. This is an issue because leaves the node unavailable for longer than it needs to be. We plan to improve this by creating a devstack plugin to handle node allocation from molteniron. This would allow us to check out the node right before running the ironic devstack plugin, thus making the length of allocation much shorter.

Additionally we’ve reached out to the community for feedback on the MoltenIron tool. One suggestion we plan to address is making MoltenIron compatible with other drivers. We built MoltenIron around our needs when using the pxe_ipmitool driver, which requires different node properties than say the iscsi_ilo driver. This could be improved by expanding our relational database to have a different table for each driver.

The code will be made available in the third-party-ci-tool repo. Please join us in making MoltenIron a reusable service available to all Ironic CI test teams.

1 comment on"Ironic Continuous Integration Testing With Hardware Pool Management"

  1. […] MoltenIron, a baremetal test pool management […]

Join The Discussion

Your email address will not be published. Required fields are marked *