Orchestrate multi-tier application deployments on Kubernetes using Ansible, Part 2: Finding the right answers

In Part 1 of this article series, I discussed deployment automation in the world of Kubernetes, breaking down the strengths, weaknesses, and opportunities of current approaches to an on-premise delivery of containerized software.

Now I’d like to discuss the criteria of fully-featured, Kubernetes-centric deployment automation and present the conceptual view of an open source-based solution that attempts to meet those standards.

Dealing with application deployment on Kubernetes seems straightforward. We all completed beginner tutorials where issuing one or two commands creates a pod running with your application inside. Need parameters? No problem. Helm can “do everything for you.” Unfortunately, orchestration projects have the same physics as any other IT project. Its growth and maturing tend to quickly verify initial assumptions, especially those built on top of basic use cases (I say that as someone who has witnessed several enterprise projects built from the ground up during my professional career.)

Will the same utility used for single applications scale for multiple ones? Will it support the burden of maintenance scenarios as equally well as it would the polished initial deployment flow? What are the chances you’ll be forced to hack your code in the future because of its limitations? Or will you have to bypass the foundations entirely? Like most things, finding the right answers involves an in-depth understanding and thorough strategic planning.

In this article, I dig deeper into the above questions in an attempt to describe a solution that leverages top-grade open source technologies, in order to find the right answers.

The requirements

Our goal is to be able to orchestrate deployment and lifecycle management of complex containerized application systems. These are usually abstracted out as higher-order entities (controllers) in a highly configurable, coordinated, and ecosystem-aware manner. Let’s try to decompose this into individual requirements.

Orchestration is one of many examples of a meta-code, which in this case, is code that describes how to make another code run. Same as with any code, you can break your list of requirements into functional and non-functional. Important requirements of a fully functional deployment orchestration include the following:

  • The orchestration code must support a wide variety of on-premise environments, public clouds, and operating systems.
  • It should be possible to provide input parameters describing the ecosystem context and restrictions. For example, environment hosts, network domain details, initial resource allocations, and more.
  • When executed, the orchestration code should run unattended. At the same time, it should allow for selective execution for advanced use cases.
  • After the orchestration workload finishes, the whole system of interconnected applications is expected to be running out-of-the-box.

Non-functional requirements center around the same basic topics as traditional software, including usability, maintainability, reliability, interoperability, or security. To fulfill those generic requirements, the orchestration software can (and should!) build on top of well-established modern DevOps techniques. Orchestration software should:

  • Be extremely simple to set up. It should not require complex prerequisites to be set up in the ecosystem. Ease of use is one of the critical usability criteria.
  • Be idempotent, which helps satisfy the reliability aspect. It should be able to easily recover from error conditions, either temporary events or after manual intervention into the system, by just re-running the orchestration workload.
  • Not rely on heavy server/daemon processes, which introduce a maintainability penalty.
  • Make use of well established, community-driven interfaces only (no vendor lock-in), which fosters interoperability and transparency which is an important part of security.

As discussed in part 1, existing market leaders usually fail at one or more of the above aspects. Yet, the advancement of modern technologies lets us build on top of a set of great existing tools to devise a unique, domain-oriented solution that attempts to meet all the requirements stated.

Ansible to the rescue

Ansible is a flexible, open source automation tool that comes with a complete set of capabilities to perform a variety of configuration management tasks over multiple systems at the same time. The Ansible model defines several generic concepts that are described in the Ansible User Guide.

The most fundamental Ansible features are represented by modules. Modules are specialized plugins that do one thing right, like execute a system command or add one or more lines into a file. They also hide the complexity of advanced tasks by managing the state of operating systems services, installing and uninstalling software repositories, and more, in a distribution-agnostic way.

Ansible tasks are specific module invocations, often enhanced using variables, loops, and conditionals. For each of the tasks, Ansible records whether it resulted in a change or not, and lets you react on changes using handlers. Tasks and handlers that share common goals, for example, managing a certain application, can be organized into roles. Roles also allow you to define variables with default values and declare dependencies on other roles.

Finally, both tasks and roles can become parts of larger logic units called playbooks, at which you specifically bind the “what” (action to be taken) with the “where” (which hosts). A host or a set of hosts where the automation should be executed on is called the inventory. The inventory also allows for assembling hosts into groups and groups to be organized into hierarchies.

As you can see, Ansible is well-evolved and provides a great deal of flexibility. At the same time, it allows us to abstract out reusable components and then compose them into larger systems. Say, in addition, it’s totally serverless. That would mean you’d end up getting a great tool to handle any automation, even the most complex one. Now let’s focus on narrowing down generic Ansible concepts to the Kubernetes domain-specific automation routines.

The inventory

First, and the most critical item for achieving our goal, is the concept of inventory. The inventory represents action subjects or hosts on which the actions will be performed. In the world of Kubernetes, you can identify at least two kinds of host types:

  • Kubernetes cluster hosts, which can be broken down further into groups that play certain roles, including masters, compute, storage, infrastructure, and more.
  • Hosts that provide cluster management tools configured for cluster connectivity. For instance, the workstation of the developer (cluster tenant) who plans to deploy their application onto the cluster or the workstation of the cluster administrator. Let’s refer to such hosts as the coordinator hosts.

Finally, Ansible always implicitly includes the notion of controller host, or a control node, which is the system where Ansible is being run from.

The following diagram depicts the dependencies between these groups, assuming the most complicated case:

image

However, in most circumstances, these groups will intersect in ways that will vary based on cluster kind, architecture, and vendor. For instance, deploying against managed clusters hosted in the public cloud will often mean limited admin rights and remote coordination.

image

In on-premise deployments, coordination is often made possible directly from the master hosts, without the need to configure and maintain a separate coordinator.

image

Or, even on the same host in all-in-one setups like development workstations.

image

Any of the above setups can be expressed as an Ansible inventory. An important advantage of making use of the inventory is it allows you to decouple the actual automation code from the information about specific environments by defining abstract, flexible, and hierarchical Kubernetes domain-oriented infrastructure model.

Actions: Playbooks, roles, and tasks

Ansible playbooks are YAML documents that declare a list of plays, which allow running a set of actions (tasks) against a particular host or group of hosts from the inventory. Quoting Ansible documentation, playbooks tend to be “a model of a configuration or a process” rather than “a programming language or script”. A well-written playbook has one more important characteristic: it is idempotent and “running it multiple times in a sequence should have the same effect as running it just once.”

For people familiar with Kubernetes, this should immediately ring a bell. Kubernetes declarative object management enforces the desired resource state in an idempotent manner. Fortunately, you can easily merge these two worlds using basic Ansible features. You can run kubectl apply with the help of the Ansible command module, and then let Ansible know whether a change has been made to a resource or not by analyzing the command output with the changed_when expression. Being able to discover desired state changes makes it possible to react to such events with the use of Ansible handlers.

While playbooks provide a functionally complete set of capabilities, they can easily get out of hand for complex operations that consist of tenths or hundreds of tasks. And no, there is no exaggeration in stating that you can easily see a final summary of a playbook run with hundreds of tasks executed after just a month of development! Fortunately, Ansible provides a means of decomposing playbooks into smaller units called task lists and roles. While the former are merely reusable lists of tasks, the latter allow us for greater flexibility in defining self-contained pieces of the orchestration code.

Except for improved code organization, there is an additional gain you get from the use of roles in the context of Kubernetes application deployments. Applications deployed inside a Kubernetes cluster often abstract out communication with the notion of a service, resolved using a Kubernetes DNS-based service discovery mechanism. This dependency isn’t very strong because Kubernetes will not prevent deploying a pod, even if its functional dependency is not yet deployed. This puts more burden on application developers who are trying to ensure the application can properly report back its unavailability to Kubernetes, through readiness probes, to prevent routing requests to applications that cannot reliably serve them. A burden is also placed on DevOps engineers who are trying to ensure service deployments are orchestrated in the correct order. With the use of Ansible roles, the dependency between applications is modeled as a role dependency, which lets you enforce deployment order. Furthermore, this simplifies the playbook code and the order of operations is hidden (encapsulated) within the role code.

You now may be wondering, “What is the right granularity level for a role?” It’s usually whatever makes a meaningful business-level operation. Examples of such actions are:

  • Deploy, undeploy, or update an application
  • Scale a deployment
  • Create or delete a database instance or a messaging topic
  • Take backup

Declaring relatively fine-grained roles also allows for partial execution, which is possible thanks to Ansible tags.

Customization: Variables and templates

To this point, I’ve envisioned how to orchestrate Kubernetes resource management using Ansible playbooks. However, one more crucial aspect of any orchestration solution is missing: the ability to customize input values on a per-environment basis. Deployments on production environments versus development environments use different values for things like, compute resources available to containers, default replication factors, persistent storage kind, target namespace, and many others.

Application orchestration may require entirely different execution paths for different kinds of environments. For example, it may be worthwhile to import a provided trusted SSL certificate for production environments while development environments may require generating a self-signed SSL certificate.

To aid in customizing your automation code, Ansible allows you to define variables with static values in many different places, but also dynamically discover them from systems (set from tasks output) and passed in ad-hoc using a command line. “Magic” variables exist with predefined semantics. Due to this richness, variables have precedence. Using and processing variables is possible with the use of extended Jinja2 expressions, a fully-featured scripting language. In the context of Kubernetes resource orchestration, expressions are primarily useful to introduce parameters to Kubernetes resource configuration files.

Control flow: Conditional execution, loops, and list processing

The features of Ansible I presented so far are good enough for controlling any application deployment. However, developers are used to greater flexibility. Why not allow them to write automation code like they normally do in Bash or Java?

One of the core features of any programming language is if-else statements. In Ansible, you can achieve conditional execution using a when statement. A when condition can be specified for any task, including import and include tasks, in which case the condition will propagate. when statements are Jinja2 expressions and can utilize the same powerful computation features and variable access as templates.

Loops are a great way of optimizing the amount of code by eliminating repetitions. Ansible documentation covers loops extensively, so I won’t spend too much time discussing them here. The majority of generic Ansible looping concepts apply to this Kubernetes-specific resource orchestration case, allowing you to make your automation code more concise and understandable. In particular, you can loop over a set of resource configuration file templates, rendering and applying them sequentially. Since Ansible, by default, does not display the output of the commands being run, putting one resource per file and then applying the configuration files in a loop allows you to visualize individual resource state changes in the Ansible console output. It also lets you write smaller configuration files, since the resource change apply order is no longer expressed via a single configuration file structure (it is now part of a playbook or a role). Having smaller files is also in line with another general software development rule known as the single responsibility principle which is part of the SOLID principle set.

It’s helpful to know that in Ansible, loop execution results can be registered as variables (sort of like what you can do in functional languages) and then used to feed a loop in subsequent tasks. This use case proves to be extremely useful when working with more complicated scenarios.

Lastly, another common looping pattern, especially useful in handlers, is to use until and retries task attributes to wait for a certain condition to occur. While the kubectl tool features a wait command, it has a really peculiar interface and does not display any progress (not even by itself, not to mention with Ansible integration), which yields a user-unfriendly process blocking experience. By using the until loop, you in fact poll instead of actively waiting, which results in better usability and performance.

Summary

I hope after reading this article you now have a better understanding of how Ansible, a feature-rich automation engine, can help to automate Kubernetes workload management. Ansible concepts like tasks, handlers, roles, playbooks, and variables can be put to good use, fitting them into a container-native ecosystem. In Part 3, I’ll share a real example of an automation code that executes multi-tier system deployment. Stay tuned!

Marcin Lewandowski