In his blog post What’s Going On (in my cluster)?, my colleague Harald Uebele describes how to connect Cloud Native Starter, a web app made up of several services running in a Kubernetes cluster, with LogDNA, an agent-based centralized log collector.
This tutorial expands on that work and and describes how LogDNA helps developers and operation teams tackle some common production scenarios. The tech stack used in the examples in this tutorial includes the Kubernetes Service and IBM Cloud Log Analysis with LogDNA services on IBM Cloud.
Most of this tutorial is adapted from personal experience in building and troubleshooting the back end of the IBM Developer mobile app.
Provision a Kubernetes cluster and install the Cloud Native Starter web app: Follow the instructions at github.com/nheidloff/cloud-native-starter to set it up.
Install the Artillery load testing tool from artillery.io/docs/getting-started.
After a cluster is provisioned, you can complete all the steps in this tutorial (including setting up LogDNA initially) in about an hour.
Create filtered views
By default, LogDNA displays lines from all containers in a Kubernetes cluster. This display creates too much “noise,” because a Kubernetes cluster has many subsystems and internal pods that the LogDNA agent picks up and indexes. If you want to filter the view to only see the applications you’re interested in, there are several ways to do so. You can filter views from specific containers and views from search text.
Views from specific containers
LogDNA can separate its logs into buckets, but the type of bucket depends on the kind of cluster. In Kubernetes, this view represents containers. So, you can select containers representing your application to limit the data shown.
Prior to filtering, you might feel overwhelmed with output from Kubernetes system pods, as shown in the following screen capture:
From the drop-down list, you can select the apps that you want to view:
When you filter for just the containers you want, the noisy system info is removed and you just see events. You can save this filtered view to use later. After selecting just the containers you want to see, click Unsaved view in the upper left and then Save as new view/alert. Give the new view a unique name and click Save.
Tip: Filter the view by selecting a combination of “apps”. In Kubernetes, the apps are containers, which makes it easy to select just the ones you’re interested in.
Views from search text
You can also filter in LogDNA by entering a search query. This approach is especially helpful for creating alerts based on conditions in the log.
Click the saved view on the left side of the LogDNA screen, then enter a search term. For example, you can set it up to view Java virtual machine (JVM) container shutdowns with the phrase
JVM is exiting.
Then, from the drop-down list for the view, select Save as new view/alert and type a name. For example, you can call this one
Now you have a new view available on list of views, which includes both the initial app filtering as well as the new filter that is based on a search.
Set up alerts from existing views
You can trigger alerts in LogDNA whenever log lines appear in a custom view. Now, based on the view created earlier, this section shows how to set up alerts so you know when a JVM container stops. Although this alert configuration sends an email for every new line, you can batch entries and send summary alerts to keep from being overwhelmed by alerts when many similar messages appear.
See the following example of an alert:
To trigger an alert based on a log, start by deleting one of your pods from the command line:
$ kubectl get pods NAME READY STATUS RESTARTS AGE articles-dddf44c85-xqdqt 2/2 Running 0 5d23h authors-7bc9995896-f8kv2 2/2 Running 0 8d client-59b69f46c4-7k6j7 1/1 Running 0 15d logdna-agent-4rwxl 1/1 Running 0 15d logdna-agent-jspz8 1/1 Running 0 15d logdna-agent-xtkfj 1/1 Running 0 15d web-api-v1-cc9b5c5d5-9ws9z 2/2 Running 0 8d web-app-f8f47cdfd-qr8rr 2/2 Running 0 8d [Thu May 23, 02:58 PM] $ kubectl delete pod articles-dddf44c85-xqdqt pod "articles-dddf44c85-xqdqt" deleted
After you set up alerts for email, the alerts are sent as soon as the lines appear in the log, as shown in the following screen captures:
Create a board to chart how your API is used
This section takes a look at the LogDNA boards. I love this feature because it’s a great way to visualize a large quantity of data quickly, find trends in traffic, and even discover error conditions before they become serious.
Start by creating charts to graph specific author names on inbound requests. You create a graph that uses the authors app and filters it by name.
Create two graphs, one for “Niklas” and for “Harald”, and then drive traffic with
artillery and examine the histogram in the view showing frequencies of various requests.
The examples in this tutorial started requests about 5 minutes apart, requesting content with different parameters. This test is synthetic, but it shows the kind of visualizations that are simple to build in LogDNA.
Note: See the configurations starting with
simtraffic_ in GitHub. This example uses the following
configurations to drive traffic to the app with specific arguments:
[11:35 AM] # artillery run simtraffic_harald.yml
[11:41 AM] # artillery run simtraffic_niklas.yml
The following screen capture shows the frequency spikes in two separate graphs that coincide precisely with the times our artillery test ran:
Discover errors and drill down into logs
To create an error, delete a running pod in the middle of a retrieval operation. As before, you break the app by deleting a pod.
You can create graphs for the example app simply by filtering on “error”. You might want to create graphs for more specific conditions (for example, in a different app, I have filters specific for network errors.) However, I like a broad error filter because you can drill down by selecting a time range when an event happens and then immediately jumping into the log view, correlated by time stamps. These capabilities are a huge improvement over traditional workflows for event searching in text log files.
You see a spike in the graph showing when the error occurred, like the following screen capture:
If you zoom in to the logs at the time of the error, and then you go back to the view you set up, you see the
You can analyze the situation further by examining logs around the failure to get a clear picture of the root cause.
This tutorial showed how to simulate web traffic with Artillery load testing tool and how to use LogDNA with Kubernetes to help you filter logs, identify traffic patterns, and even detect error conditions and help analyze root causes.
Understanding behavior in a microservices-based application is always more challenging than in a traditional, monolothic service. Becoming comfortable with a tool like LogDNA can yield huge dividends in improved productivity when you work with microservices and Kubernetes.