Implement and deploy a manageable application

In projects that follow DevOps practices, development and operation teams work together to ensure that an application fulfills users’ functional requirements while also running well in production (meeting non-functional requirements). Build to manage practices are built into an application and define how the application is deployed so that it will be easier to manage — for both developers and operations teams. These practices also make it easier for the Kubernetes or Red Hat OpenShift cluster to manage your applications for you.

This document highlights 4 build-to-manage practices that you should build into your application’s deployment manifest and application code to make it easier to manage while it’s running:

  1. Container status probes: determine whether the container and its application are running
  2. Specify the pod resource requirements: specifies the capacity the application requires
  3. Health endpoint: provides the application’s status
  4. Logging: records the application’s history while it is running

For each element, this article describes:

  • Requirements — The conditions necessary to perform the task
  • Red Hat Container Certification requirement — Any requirements expressed in or related to certification
  • Solution — How to solve the problem and meet the goal within the requirements
  • Cloud-Native Toolkit — Examples of the toolkit embedding the solution

1. Implement container status probes

Each pod should configure probes on each of its containers that indicate the status of the container and its application. A cluster uses these status probes to determine whether the container and its application are running properly.

A cluster knows to route client requests to pod replicas that are working and avoid ones that aren’t. Clusters can even detect pod replicas that aren’t working, shut them down, and start replacements.

Requirements

A Kubernetes or OpenShift cluster implements probes for determining a container’s status. The two main probes are:

  • Liveness — Indicates whether the container state is alive or whether it should be restarted. If the pod does not implement the probe, the cluster assumes that the container’s state is the same as the state of the container’s main process (known as PID 1). With no liveness probe, the cluster’s default assumption is that the PID 1 status is the container’s status, which is reasonable — at least for any container that only runs a single process (which is true for almost any well-designed application container).

  • Readiness — Indicates whether the application in the container is running and ready to receive requests, or whether the application is still starting (such as while the language runtime is starting up or the application is initializing). If the pod does not implement the probe, the cluster assumes that whenever the container is alive, its application is ready (that is, readiness = liveness). This assumption is almost certainly not true, especially not until the application has had time to start. While the application is starting up, the container is alive, but the cluster receives HTTP 500 status codes from a container that is supposedly running properly. This causes client requests to fail intermittently while a pod is restarting or autoscaling.

A simple best practice is to always configure a pod with liveness and readiness probes for its application containers. Even if the probes contain no application health monitoring logic, a liveness probe doesn’t hurt and a readiness probe almost certainly helps.

You (probably) need liveness and readiness probes explains these probes in detail and walks you through examples of using them in different application scenarios.

Red Hat Container Certification requirement

Red Hat Container Certification does not take a stance ether way on including status probes.

OpenShift Container Platform-specific guidelines in the OpenShift docs specifies that containers should be deployed with liveness and readiness probes.

Monitoring application health by using health checks in the OpenShift docs explains how an application has several options for helping the cluster detect and handle unhealthy containers, and elaborates on using readiness, liveness, and startup health checks.

Solution for implementing container status probes

Developing Kubernetes-native apps: Liveness and readiness probes gives guidance for configuring a very basic pair of status probes, like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: wasliberty
spec:
  replicas: 3
  template:
    spec:
      containers:
        - image: wasliberty
          name: icr.io/ibm/liberty:latest
          imagePullPolicy: Always
          ports:
            - containerPort: 9080
          livenessProbe:
            httpGet:
              path: /
              port: 9080
            initialDelaySeconds: 300
            periodSeconds: 15
          readinessProbe:
            httpGet:
              path: /
              port: 9080
            initialDelaySeconds: 45
            periodSeconds: 5

Simply having probes that exist and that use the HTTP root URI is a good start. These are enough to enable the cluster to determine whether the container is alive and whether the application is ready — and they require no new application functionality. The developer and deployer can expand on these to implement application functionality and more sophisticated testing for conditions like deadlock detection, as well as active health monitoring.

Example of container probes in the Cloud-Native Toolkit

The Starter Kits in the Cloud-Native Toolkit configure their pods with container status probes. All of the Starter Kits use a shared starter-kit Helm chart stored in the Toolkit charts repo. The chart’s Deployment template includes these sections to specify a pair of liveness and readiness probes (for clarity, this excerpt shows the template variables filled in with the values for a sample stockbffnode microservice):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: stockbffnode
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: stockbffnode
          image: stockbffnode:0.0.1
          imagePullPolicy: IfNotPresent
          ports:
            - name: http
              containerPort: 3000
              protocol: TCP
          livenessProbe:
             tcpSocket:
              port: http
          readinessProbe:
             tcpSocket:
              port: http

Notice these probes test TCP sockets rather than HTTP paths. Either type works. This approach doesn’t require that the app implement an endpoint for the path (even if it’s just /) and recognizes that an endpoint that does nothing is little more than a port. Also, a TCP port works with multiple network protocols, not just HTTP.

The cluster runs these probes frequently, so they should be simple and quick. Simply by establishing that the socket can open a connection, the container confirms that it’s alive, and the application confirms that it’s ready.

2. Specify the pod resource requirements

Each pod should specify the resources that its containers require. This enables the cluster to optimize usage of the resources in its worker nodes by placing pods across the worker nodes in order to maximize container density, enforce quotas, and autoscale pods.

Requirements

When a cluster starts a pod on a node, the node needs to have enough resources for the containers to actually run. To tell the cluster the resources it needs, a pod can optionally specify the resources that the applications in its containers need to run properly. CPU and memory (RAM) are common resource types for specifying resource requirements. A pod specifies its size within the cluster as requests and limits:

  • Requests — The minimum or initial size that the pod’s application requires
  • Limits — The maximum size that the pod’s application can use efficiently; limits must be greater than or equal to the requests

Growing from the requests size to the limits size is vertical scaling. For a cloud-native application, you should keep the size small and depend on the application’s horizontal scaling. This helps the cluster optimize pod placement, scalability, and recovery.

When a new pod is requested, the scheduler decides what node to start the pod on. The scheduler uses the pod’s size settings to optimize:

  • Placement — To determine which node to start a pod on, the scheduler only considers nodes with sufficient available capacity for the pod’s requests and prefers nodes with capacity for the pod’s limits.
  • Quota enforcement — When a project specifies a resource quota, the scheduler only starts a pod in a project if it has sufficient resources remaining in its quota.
  • Autoscaling — The horizontal pod autoscaler (HPA) automatically scales the number of pod replicas when the usage as a percentage of the pod’s size goes beyond the specified threshold.

If a pod does not specify its size, the scheduler assumes the pod is very small. It may place the pod on a node or in a project with very little available capacity, and autoscaling cannot work.

Red Hat Container Certification requirement

Red Hat Container Certification does not require a pod to specify its resource requirements, nor does certification exclude the practice.

Solution for specifying a pod’s resource requirements

Developing Kubernetes-native apps: Resource requests and limits gives guidance for specifying a resource’s requests and limits, like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: wasliberty
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: wasliberty
        image: icr.io/ibm/liberty:latest
        imagePullPolicy: Always
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1024Mi"
            cpu: "1000m"

Memory is specified in units of bytes, and CPU is specified in units of millicores. This pod’s initial request for its container is for 512 mebibytes of memory which can grow up to 1024 mebibytes, and 0.500 cores which can grow up to 1.000 cores.

Cloud-Native Toolkit

This feature is not yet included in the Cloud-Native Toolkit.

The Starter Kits in the Cloud-Native Toolkit can optionally configure their pods with resource requests and limits (this is not done by default). All of the Starter Kits use a shared starter-kit Helm chart stored in the Toolkit charts repo. The chart’s Deployment template includes a section to specify resources.

For clarity, this excerpt shows the template variables filled in with the values for a sample stockbffnode microservice:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: stockbffnode
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: stockbffnode
          image: stockbffnode:0.0.1
          imagePullPolicy: IfNotPresent
          resources:
            {{- include "starter-kit.resources" . | nindent 12 }}

The default value for starter-kit.resources is specified in the starter-kit chart’s values file:

resources: {}
  ## We usually recommend not to specify default resources and to leave this as a conscious
  ## choice for the user. This also increases chances charts run on environments with little
  ## resources, such as Minikube. If you do want to specify resources, uncomment the following
  ## lines, adjust them as necessary, and remove the curly braces after 'resources:'.
  # limits:
  #   cpu: 100m
  #   memory: 128Mi
  # requests:
  #   cpu: 100m
  #   memory: 128Mi

Each Starter Kit can override this in its Helm chart. For example, the values file in the Helm chart in the Node Typescript Starter Kit includes these lines to specify resources:

  # resources: {}
    # We usually recommend not to specify default resources and to leave this as a conscious
    # choice for the user. This also increases chances charts run on environments with little
    # resources, such as Minikube. If you do want to specify resources, uncomment the following
    # lines, adjust them as necessary, and remove the curly braces after 'resources:'.
    # limits:
    #   cpu: 100m
    #   memory: 128Mi
    # requests:
    #   cpu: 100m
    #   memory: 128Mi

3. Implement a health endpoint

Applications need to implement a health endpoint, a simple API that indicates whether the application is running properly. Monitoring can use a health endpoint to detect the status of the application, not just its runtime or the container it’s running in. Load balancers and service registries can use health endpoints to avoid application replicas that are not running properly.

Requirements

When an application makes its health status accessible, the status is typically exposed as a REST endpoint named /health. When a collection of applications or microservices all expose this endpoint, monitoring tools can easily track them all as members of a group.

A health endpoint’s answer can be as simple as a boolean or as complex as a detailed health status. A simple response indicates that the application is running and may be all that’s required for simple monitoring. Implement more sophisticated health metrics as needed to support more sophisticated monitoring.

Pattern: Health Check API describes this best practice for microservices and shows a skeletal implementation with a link to a full Microservices Example application.

Red Hat Container Certification requirement

Red Hat Container Certification does not take a stance ether way on including health endpoints.

Solution for implementing a health endpoint

An application should implement a /health endpoint. How it does this depends on the language, library, and application logic.

For example, Java Spring Boot includes the Actuator library. As documented in Spring Boot Actuator: Production-ready Features — Writing Custom HealthIndicators, this library provides the Health class and HealthIndicator interface to implement your own health indicator, such as this example, MyHealthIndicator:

import org.springframework.boot.actuate.health.Health;
import org.springframework.boot.actuate.health.HealthIndicator;
import org.springframework.stereotype.Component;

@Component
public class MyHealthIndicator implements HealthIndicator {

    @Override
    public Health health() {
        int errorCode = check(); // perform some specific health check
        if (errorCode != 0) {
            return Health.down().withDetail("Error Code", errorCode).build();
        }
        return Health.up().build();
    }

}

As you can see, MyHealthIndicator has to implement health(), which needs to return either Health.down() or Health.up(). The trick, then, is for MyHealthIndicator to implement check() which returns an error code (0 for healthy). What exactly check() does and how it’s implemented is application-specific.

Example health endpoint implementation in the Cloud-Native Toolkit

The Starter Kits in the Cloud-Native Toolkit implement health endpoints in the sample applications. For example, the HealthController class in the Node Typescript Starter Kit implements a /health endpoint:

@Path('/health')
export class HealthController {

  @GET
  async healthCheck(): Promise<{status: string;}> {
    return {
      status: 'UP'
    };
  }
}

By default, it always returns the UP status. But that is enough to at least confirm that the app is running and is healthy enough to respond. The application developer may wish to customize this implementation to provide more detailed status.

4. Log events

Each application should log a history of its important events. A log aggregation tool combines the logging output of multiple components and correlates them into a single timeline. Development and operations staff can use the aggregator to associate and filter events to understand how an application is working.

Requirements

A standard practice used to be for an application to write to its own log files. Now, modern applications write their log messages to stdout which the environment (such as a cluster) can configure to redirect to its preferred output, be it a screen, files, aggregators, or the like.

Logging Best Practices: The 13 you should know captures a good list of practices. The first is use a logging library. Logging libraries are language dependent because the logging messages are built into the application code. Java has several logging libraries, such as Log4j and SLF4J. Node.js has Pino, Winston, and others.

While you typically want to avoid locking yourself into a particular vendor, the reality is that when you have thousands of lines of code invoking a specific logger API, you’re pretty locked into that API. Fortunately, most loggers have an API that is simply a method for each severity level:

logger.info("Message");
logger.error("Message");

You should always log meaningful events with descriptive messages for future reference. When the application does something important, especially something that might fail, log it. Log a message before the important part with the parameters to store a history of what was supposed to happen. Then log another message after the important part with the answer for a history of what did happen. And if an error occurs, definitely log that. Treat the log as an event stream, a running commentary on what the application is doing.

Don’t just write logs — read them and see whether you can use them. The best way to determine if logging is working sufficiently is to use it. Looking at the logs, can you tell if the application is running correctly and what it’s doing? If it fails, do the logs show you what failed and provide some context as to why? Using an aggregator, can you cut through the chaff and quickly find the logs that are relevant to your current task?

Red Hat Container Certification requirement

Red Hat Container Certification does not require that applications implement logging, nor does certification exclude the practice.

OpenShift Container Platform-specific guidelines in the OpenShift docs specify that logging should write its output to standard out so that the cluster can collect it.

Solution for logging events

Select a convenient logging library for the language you’re using.

Let’s look at a Java example using the Simple Logging Facade for Java (SLF4J) library. This example comes from SLF4J: 10 Reasons Why You Should Be Using It:

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class SimpleClass {

    Logger logger = LoggerFactory.getLogger(SimpleClass.class);

    public String processList(List<String> list) {
        logger.info("client requested process the following list: {}", list);

        try {
            logger.debug("Starting process");
            // ...processing list here...
            Thread.sleep(5000);
        } catch (RuntimeException | InterruptedException e) {
            logger.error("There was an issue processing the list.", e);
        } finally {
            logger.info("Finished processing");
        }
        return "done";
    }
}

Notice how this example does logging well:

  • Logging library — It uses a logging tool, SLF4J, rather than simply printing messages to stdout.
  • Machine parsable — SLF4J, like most logging tools, produces machine-parsable formats.
  • Human readable — The messages can be read by people browsing the logs.
  • Log category — The logger is initialized to indicate which component in the application is generating the messages, in this example SimpleClass.
  • Log level — Messages are logged at different levels: error, info, and debug.
  • Context — The messages include parameters, such as the list to be processed and the error that was caught.
  • Meaningful — The messages log key points in the list processing: when it starts, when it finishes, and when it hits an exception.

These are logs that an aggregator can sort through to find significant events.

Logging example in the Cloud-Native Toolkit

The Starter Kits in the Cloud-Native Toolkit implement logging in the sample applications. For example, the HelloWorldController class in the Node Typescript Starter Kit does some basic logging.

The starter implements its own logger which is accessible through the LoggerApi class. HelloWorldController then uses that to log messages as it performs functions:

...
import {LoggerApi} from '../logger';

@Path('/hello')
export class HelloWorldController {

  ...
  @Inject
  _baseLogger: LoggerApi;

  get logger() {
    return this._baseLogger.child('HelloWorldController');
  }

  @GET
  async sayHelloToUnknownUser(): Promise<string> {
    this.logger.info('Saying hello to someone');
    return this.service.greeting();
  }

  @Path(':name')
  @GET
  async sayHello(@PathParam('name') name: string): Promise<string> {
    this.logger.info(`Saying hello to ${name}`);
    return this.service.greeting(name);
  }
}

Messages in the log specify that they were produced by the container process for this application, specifically by its component named “HelloWorldController.”

Summary

This article has described 4 best practices for making applications easier to manage while they’re running in a Kubernetes or OpenShift cluster:

  • Implement container status probes
  • Specify a pod’s resource requirements
  • Implement a health endpoint
  • Log events

These, along with the other best practices in this learning path for designing and building better images, will help make your application work much better in both Kubernetes and OpenShift.

If you would like to use the Cloud-Native Toolkit to build UAIs, see the Use the Cloud-Native Toolkit to build universal application images from starter kits tutorial.