2021 Call for Code Awards: Live from New York, with SNL’s Colin Jost! Learn more

IBM Developer Blog

Follow the latest happenings with IBM Developer and stay in the know.

What to do when Kubernetes isn't invincible

If you’re a user of Kubernetes, you know it can solve many important issues – like making the best use of available resources to keep applications performing well. But even Kubernetes is not invincible. What happens when you’ve scheduled a pod that’s too large for any of the available nodes, and now the deployment is failing? Or someone deployed additional pods overnight, causing resources to be taken from other pods? What about a memory leak issue where a pod’s memory usage is spiking and causing the node to fail?

Going into Kubernetes logs to pinpoint the source of the problem can be time consuming and a bit of trial and error. That’s why IBM is introducing a Kubernetes-monitoring capability in our IBM Cloud App Management Advanced offering. Now, instead of looking through logs to correlate the symptoms with the source of the problem, you can visualize the Kubernetes ecosystem and instantly see where the problem lies.

For instance, let’s take a scenario that happened to our development team recently. We had a pod that began consuming a massive amount of resources, which caused the node itself to near capacity. This caused the kubelet to start evicting pods left and right – clearly not something we want to affect our application.

Using only the Kubernetes logs, we would see a DiskPressure issue with the node and the pod evictions. But this doesn’t provide much context, and with numerous pods in the same node it would be difficult to pinpoint which one was causing the problem. Luckily for us, our dev team was using Cloud App Management when this problem occurred and we were able to quickly see the Top 5 pods by CPU utilization and memory. This enabled easy identification of the problem-causing pod. You can see in the screenshot below, on the lower righthand side we have the Top 5 lists for CPU utilization and memory, which clearly shows the pod that is consuming the most resources. While we are okay on CPU utilization for the node as a whole, we can see on the graph on the top right that our memory usage has exceeded its max.


From this view we can also get information on the pods and containers within the node with a straightforward visualization of the infrastructure on the left. All the pods that are being evicted are shown in red. We can also use the timeline at the top to scroll backwards in time to see how the deployments have changed. With all of this information readily accessible, it makes it easy to pinpoint and troubleshoot Kubernetes issues.