Win $20,000. Help build the future of education. Answer the call. Learn more

Enterprise chargeback with Red Hat OpenShift on IBM Z and IBM LinuxONE

Infrastructure and platform groups within organizations generally charge consumers of their services based on lines of business (LoBs) or at a more granular, application level. Existing chargeback models are built on virtual machine (VM) usage pools where VMs are allocated to LoBs on demand, and usage is based on resource consumption over a period of time. With the density and multi-tenancy offered through cloud-native technologies like Red Hat OpenShift, depending on a VM-based chargeback alone is insufficient. This tutorial covers several design points around how to deal with chargeback in an organization that adopts OpenShift as its enterprise Platform as a Service (PaaS) strategy, with a primary focus on OpenShift on the IBM Z and IBM LinuxONE platforms.

Prerequisites

  • OpenShift 4.x — While this tutorial is relevant for all architectures, an OpenShift cluster on IBM Z/IBM LinuxONE will most align with the architecture (for example, LPAR).
  • oc command-line tools

Estimated time

It should take you about an hour to create the simple “Hello World” application described here, but doing this production involves communication with the LoBs, licensing, procurement teams, and so forth, which can take weeks or months.

Concepts and steps

How to charge

In situations where guaranteed service-level agreements are required (such as production), it makes sense to devote entire nodes to LoBs. This can result in under-utilization/less density and LoBs paying for resources they might not entirely consume. As with any decision in the IT world, this is an engineering trade-off. OpenShift provides a mechanism to limit LoBs to VMs, and the details are covered in the Limiting LoBs to VMs section below.

The second situation is where LoBs do not need SLAs but just need a location to run/test workloads at the lowest possible cost. This is where pod/namespace/project-specific chargeback can be performed. This is covered in the Entity resource consumption section below.

Ideally, an infrastructure provider group within an organization can provide both of the options above and let LoBs decide which mechanism makes sense — for example, Node-based for prof and perf testing, or namespace/project/pod (granular)-based for dev/test/qa/sandbox environments. This solution is detailed in the Hybrid chargeback model section.

What to charge

Typically organizations will charge for:

  • Software licensing
  • Service and support (vendor)
  • CPU capacity consumed
  • Memory utilization
  • Storage consumption
  • Network
  • Floor space
  • Power
  • Human resources (in-house support)

This can be a combination of constant baseline plus amortized and utilization costs. Depending on the hardware consumed, chargeback for an on-premise cloud can be significantly cheaper than that of a public cloud, and even within on-premise infrastructure choices options like IBM LinuxONE can be much cheaper than Intel/AMD-based options.

Entity resource consumption

OpenShift entities include pods, deployments, namespaces, and persistent volume claims. Chargeback on these entities is the easiest mechanism as it is built into OpenShift as part of the metering component.

As LoBs have namespaces (or clusters, depending on the multi-tenancy model) for specific use cases, let’s look at chargeback that use namespaces using the inbuilt metering capabilities.

  • CPU (request, usage, and utilization)
  • Memory (request, usage, and utilization)
  • Storage (persistent volume claim, request, and usage)

The oc -n openshift-metering get reportqueries command returns a list of what can be queried, including cluster-wide reporting, node-wide, namespace-wide, and the more granular pod and PVC consumption. Focusing on just namespaces here:

...
namespace-cpu-request
namespace-cpu-usage
namespace-cpu-utilization
namespace-memory-request
namespace-memory-usage
namespace-memory-utilization
namespace-persistentvolumeclaim-request
namespace-persistentvolumeclaim-usage
...

You can start metering with a Report custom resource. For scheduled reports, it should look something like this:

yaml
apiVersion: metering.openshift.io/v1
kind: Report
metadata:
  name: pod-cpu-request-hourly
spec:
  query: "pod-cpu-monthly"
  reportingStart: "2020-05-05T00:00:00Z"
  schedule:
    period: "monthly"
    monthly:
      dayOfMonth: 1
      hour: 0
      minute: 0
      second: 0

Red Hat has the following rules:

Name Data Type Range
hour Integer 0-23
minute Integer 0-59
second Integer 0-59
dayOfWeek String Day of week spelled out (e.g. “monday”)
dayOfMonth Integer 1-31

More information about reports can be found here.

Limiting LoBs to VMs

OpenShift deploys workloads into worker nodes, where each node is a VM. By default, the Kubernetes (k8s) scheduler will deploy workload pods across worker nodes based on resource utilization heuristics built into k8s. These heuristics provide optimal scheduling in general, but to align with existing organizational policies you need a way to limit pods to specific nodes.

k8s has namespaces that provide resource isolation and a mechanism for bookkeeping. It is important to note that the isolation provided is logical — a common misconception is that namespaces offer security isolation, but they do not as pods across namespaces can be deployed on the same VM and could be vulnerable to things like container escape vulnerabilities. Namespaces as-is don’t offer isolation between LoBs that align with existing VM-based chargeback metrics.

Topology

  • OpenShift 4.3
  • 3 masters named master[0-2] for simplicity
  • 5 workers named worker[0-4] for simplicity

This is our sample cluster deployed on IBM Cloud.

Note: To learn how to deploy and manage OpenShift Container Platform on IBM Cloud, please visit this page.

OpenShift Container Platform dashboard

To map LoBs to VMs, you will need to use:

Node labels

Labels allow users to map their organizational structures in loosely coupled fashion without requiring clients to store these mappings (learn more here). For this scenario, you will apply labels to worker nodes. Naming convention and org structure decomposition are beyond the scope of this tutorial, but I plan to write another tutorial that helps you define node labels, namespace naming best practices, and topologies for multi-tenancy.

For now, you have five nodes defined: worker[0-5]. You can use the names as is, or assign more meaningful names to use later for node selection for LoBs or to help make physical locations more visible. For example, you might want to define what LPAR, zVM instance, or CPC a VM was deployed to. You can apply node labels from the GUI:

Node labels

Here we’ve added three node labels:

  • purpose : dev
  • lpar : LPAR01, LPAR02
  • lob : lob1, lob2, lob3

Note: Labels must have singular values for each key, so pick only one of each option when applying it to a node.

beta.kubernetes.io/arch: s390x, kubernetes.io/arch: s390x, and arch: s390x are pre-existing labels. These are very important for multi-architecture deployments, which will be covered in a future tutorial. Architecture is irrelevant as this tutorial is valid for any architecture, but it is an important label if architecture-specific chargeback is the goal.

Mapping

We will use the following mapping for LoB ⟶ Projects ⟶ Nodes

Mapping diagram

Projects

Next, let’s create the four projects in the diagram above:

  • lob1-app1
  • lob2-app1
  • lob3-app1
  • lob3-app2

Each LoB can own multiple apps and each app will typically have a project. It’s also possible to have an LoB as a project and apps as deployments within that LoB, but this offers less granular control. Having each app as a project makes it easier to map resources to individual apps, which can be consolidated for an LoB.

You can also use namespaces to do this, but projects are just k8s namespaces with additional annotations that allow for easier multi-tenancy as follows:

  • You can have stricter validation than namespaces (i.e. you cannot annotate a project other than a handful of predefined keys, meaning you can assert a privileged user or component set that data).
  • Projects are actually indirectly created by the server through a request mechanism, thus you do not need to give users the ability to create projects directly.
  • A cluster admin can inject a template for project creation, so you can have a predefined way to set up projects across your cluster.
  • The project list is a special endpoint that determines what projects you should be able to see. This is not possible to express via RBAC (i.e. list namespaces means you can see all namespaces). Note that all of this was built in the early days of Kubernetes, and thus may be less important now.

Note: Creating projects from the command-line interface allows you to pass in a node selector using the --node-selector argument. If you do this using the GUI, you need to modify the namespace (not the project, since that’s immutable) to apply a node selector.

Command line: oc new-project lob1-app1 --node-selector='lpar=LPAR01,purpose=dev,lob=lob1' and repeat for all of the projects from your mapping diagram above. Ultimately, you will need four projects for lob[0-1]-app[0-1].

GUI: Projects are immutable, so you need to change the namespace assigned to it (which is mutable):

  1. Look for the namespace assigned to the project (it will have the same name); for example, lob1-app1.
  2. Add openshift.io/node-selector: 'lpar=01,vm-id=dev-vm-01' in the YAML.

Note: We are not using Kubernetes Affinities since they do not provide the strict placement policies needed for accurate chargeback. Neither requiredDuringSchedulingIgnoredDuringExecution nor preferredDuringSchedulingIgnoredDuringExecution meet our criteria to always run a pod on a specific node or nodes throughout their lifetime.

Groups

Next, create groups (or map from LDAP following these instructions):

Create groups

RBAC

You will need a Role (or ClusterRole) and RoleBinding (or ClusterBindings) to map groups to projects/namespaces.

Note: This tutorial assumes that you understand basics of RBAC. If not, please read the official k8s documentation on RBAC.

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: developer
rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: "*"

For this simple tutorial, we use the very permissive \*. In production, you’ll need to use the principle of least privilege and build up from there. This also needs to be repeated for lob2 and lob3 by changing the metadata.namespace field.

You can also create a Role vs. a ClusterRole and limit to the lob1/lob2/lob3 namespace, but using a ClusterRole allows you to define the permissions only once and then reference it from multiple RoleBindings for individual namespaces.

Next, set up a RoleBinding which maps roles to objects and users/groups:

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: RoleBinding
metadata:
  name: developer-binding
  namespace: lob1-app1
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: developer
subjects:
  - kind: Group
    name: "lob1-devs"
    apiGroup: rbac.authorization.k8s.io

You could also use serviceaccounts for things like automated DevOps pipelines:

subjects:
  - kind: Group
    name: system:serviceaccounts:qa
    apiGroup: rbac.authorization.k8s.io

Repeat the RoleBinding for the other three namespaces.

Note: This tutorial focuses on user roles and not service accounts.

Users from each LoB now have the ability to deploy pods only to the projects that their LoB has access to.

Hybrid chargeback model

A hybrid model for chargeback involves combining both node-level chargeback and pod-level chargeback. In the diagram below, consumers of worker[0-4] get node-based chargeback and consumers of worker[5-6] get a more granular, namespace/project/pod-based chargeback. LoBs can use both, depending on the use case as seen in the diagram below.

Hybrid chargeback model

Testing

RBAC has no default way to list which namespaces a user has access to, but you can write a simple script to do this:

bash
for n in $(oc get ns -o jsonpath='{.items[*].metadata.name}'); do
  echo -n "$n: "
  oc auth can-i get pods -n "$n" --as=[your user name]
done

Simple testing can be done using the --as parameter to the oc CLI (see the docs here).

Admission controllers (optional)

An admission controller is a piece of code that intercepts requests to the Kubernetes API server prior to persistence of the object, but after the request is authenticated and authorized. An admission controller in this context can be used for the following:

  • Enforcing node selections for namespaces (mutating controllers)
  • Ensuring that each workload has an appropriate request and limit set for resource consumption in shared multi-tenant namespaces (validating controllers)

Admission controllers are a bit like using a sledgehammer, and the methods mentioned earlier in this post are far easier to mantain and far less operationally invasive. However, I’m still mentioning it here for completeness.

Summary

This tutorial has explored various mechanisms for consuming OpenShift metering data for internal chargeback. In future tutorials, I will cover a mechanism to correlate with z/VM and KVM monitoring tools and other cloud-based tooling (such as Red Hat OpenShift Cost Management) to provide platform as well as infrastructure chargeback.

To experiment with chargeback and other concepts on Red Hat OpenShift on the LinuxONE platform, head over to our LinuxONE Community Cloud. I recommend deploying a three-tier app with Liberty, MQ, and DB2 — all available via Cloud Paks — to understand how to optimize the chargeback mechanism based on workload.