Observability is a key aspect of providing a service. As envisioned in Mikey Dickerson’s hierarchy of service reliability, monitoring is the pillar upon which all other needs of a service rely on. To provide a reliable service, monitoring must be set up and defined for the service. Specifically, well-thought-out observability that leads to meaningful alerting (pages, slack, email, phone) from meaningful data (measures, metrics).
The challenge is to be able to do this for an ever-evolving deployment of services and infrastructure that necessitate various views of metrics and alerting that are sometimes overlapping, but sometimes are not. The first part involves deploying agents to listen and generate all kinds of data, but the next part is turning that data into something meaningful. The classic problem of separating signal from noise.
IBM Cloud Monitoring provides the data collection and meaningful promotion of data for cloud service observability. Agents gather data as the deployed infrastructure and applications evolve, providing a continual level of observability that is accurate and up to date for your services. Visualization provides the ability to turn all of this data into meaningful observability through pre-defined, focused metrics dashboards for typical needs such as infrastructure, resource usage, and application overviews. In addition, you can use IBM Cloud Monitoring to build application-specific observability by defining the metrics and scopes that are tailored to your desired services.
A key part of observability is being able to alert for awareness when attention is needed for a service. This manifests into an alert (to some notification channel or service) that a value fell outside of a check. You can create several alerts from pre-defined checks and verifications of a service to proactively alert about issues affecting service reliability. IBM Cloud Monitoring also provides the ability to alert for anomalies in an environment based on deviations from observed patterns of operation of the service.
- IBM Cloud account with access to the IBM Cloud Monitoring service.
- IBM Cloud Kubernetes Service cluster. If you don’t already have a cluster in your account, follow the steps of the Creating clusters tutorial to provision one.
- IBM Cloud CLI and IBM Cloud Kubernetes Service plug-in.
Completing this tutorial should take you less than 15 minutes.
Set up observability with IBM Cloud Monitoring
1. Create an IBM Cloud Monitoring instance
Open the IBM Cloud Monitoring service within the IBM Cloud catalog.
Select the geographic region closest to you from the Select a location list.
Within the Configure your resource section, click Enable to receive platform metrics.
Click Create to provision your monitoring instance. The Observability dashboard opens and shows details for your new monitoring instance similar to the following screen capture image:
2. Add a monitoring agent to your Kubernetes cluster
On the dashboard, go to your new monitoring instance and click Add sources.
The Monitoring Sources page opens with guidance for adding agents to your monitoring instance. For this tutorial, select the Kubernetes tab, copy the Public Endpoint command, and save it for later use in step 3 after your log into your Kubernetes cluster.
Now, go to your Kubernetes clusters list and select your Kubernetes cluster.
On your cluster Overview page, click Actions and select Connect via CLI from the list.
Follow the instructions and custom commands on the Connect via CLI page to connect to your IBM Cloud cluster from your CLI (terminal). The commands and results will look similar to the following example:
$ ibmcloud login -a cloud.ibm.com -r us-south -g default --sso API endpoint: https://cloud.ibm.com Get a one-time code from https://identity-2.us-south.iam.cloud.ibm.com/identity/passcode to proceed. Open the URL in the default browser? [Y/n] > y One-time code > Authenticating... OK Select an account: 1. MARC VELASCO's Account (-------) Enter a number> 1 Targeted account MARC VELASCO's Account (----) Targeted resource group default Targeted region us-south API endpoint: https://cloud.ibm.com Region: us-south User: ------- Account: MARC VELASCO's Account (---) Resource group: default CF API endpoint: Org: Space: $ ibmcloud ks cluster config --cluster c2dg0cvd0sstp4gnc5sg OK The configuration for c2dg0cvd0sstp4gnc5sg was downloaded successfully. Added context for c2dg0cvd0sstp4gnc5sg to the current kubeconfig file. You can now execute 'kubectl' commands against your cluster. For example, run 'kubectl get nodes'. If you are accessing the cluster for the first time, 'kubectl' commands might fail for a few seconds while RBAC synchronizes.
3. Install the monitoring agent
Deploy the monitoring agent by using the public endpoint command that you copied from the Monitoring Sources page within step 2. The command and results will look similar to the following example:
$ curl -sL https://ibm.biz/install-sysdig-k8s-agent | bash -s -- -a e85574b8-b784-472e-810d-e58510bb4580 -c ingest.us-south.monitoring.cloud.ibm.com -ac 'sysdig_capture_enabled: false' * Detecting operating system * Downloading Sysdig cluster role yaml * Downloading Sysdig config map yaml * Downloading Sysdig daemonset v2 yaml * Downloading Sysdig kmod-thin-agent-slim daemonset * Creating namespace: ibm-observe * Creating sysdig-agent serviceaccount in namespace: ibm-observe * Creating sysdig-agent clusterrole and binding clusterrole.rbac.authorization.k8s.io/sysdig-agent created * Creating sysdig-agent secret using the ACCESS_KEY provided * Retreiving the IKS Cluster ID and Cluster Name * Setting cluster name as sysdigcluster/c2dg0cvd0sstp4gnc5sg * Setting ibm.containers-kubernetes.cluster.id c2dg0cvd0sstp4gnc5sg * Updating agent configmap and applying to cluster * Setting tags * Setting collector endpoint * Adding additional configuration to dragent.yaml * Enabling Prometheus Slim agent selected Processing all-icr-io as all-icr-io secret/all-icr-io created configmap/sysdig-agent created * Deploying the sysdig agent daemonset.apps/sysdig-agent created The list of agent pods deployed in the namespace "ibm-observe" are: sysdig-agent-swmp6 0/1 Pending 0 0s Make sure the above pods all turn to "Running" state before continuing Should any pod not reach the "Running" state, further info can be obtained from logs as follows 'kubectl logs <agent-pod-name> -n ibm-observe'
After some time passes, you can check to see if your agent is running in the cluster with the following command:
kubectl get pods -n ibm-observe
You should see results similar to the following:
NAME READY STATUS RESTARTS AGE sysdig-agent-swmp6 0/1 Pending 0 28m
Work with IBM Cloud Monitoring observed data
IBM Cloud Monitoring has a number of default dashboards. After your agent is configured and reports data, the pre-existing dashboards will show data.
To view the default dashboards, return to the Observability dashboard within IBM Cloud.
From your list of IBM Cloud Monitoring instances, click Open dashboard for the new instance that you created in step 1, as indicated within the following screen capture:
The default Explore view shows the entire infrastructure and all available metrics that can be utilized.
By clicking the Dashboards icon, you can find a number of prebuilt dashboards available. For example, the Container Resource Usage dashboard shows resource utilization by container for all of the containers that IBM Cloud Monitoring has data for, as demonstrated in the following screen capture:
Adjusting the widgets on the header can help you focus on a specific scope or range of containers.
Create new and custom dashboards
If you want to create a new dashboard from scratch, select the Dashboards tab and click the + Add Dashboard icon.
If you want to create a custom dashboard by using one of the pre-existing dashboards as a template, click the Create Custom Dashboard button located on that pre-existing dashboard. Then you can customize the panels and layout as needed within your new dashboard.
When you first create a dashboard, you have access to many metrics captured in your environment. These range from lower-level metrics, such as resource consumption on host hardware, to application-specific metrics.
Examples of resource consumption on host hardware metrics
Examples of application-specific metrics
Click the Explore tab to view the metrics and groups that are available for your deployed infrastructure.
Define alerts and anomaly detection
When working with IBM Cloud Monitoring data, you can generate alerts from the data in several ways. You can generate an alert for when a specific metric is over, under, or meets a defined threshold, such as CPU utilization at 100% for a prolonged period. You can create an alert for when an event occurs a certain number of times, such as pod restarts or readiness probe failures. You can also create an anomalous alert for when the observed condition of a service changes, such as when the level of CPU utilization suddenly rises from 50% to 100%.
There are several metrics that signify a service failed or is about to fail. You can define alerts in IBM Cloud Monitoring at varying levels, whether across the whole scope of all deployed services and infrastructure, or by specifying something more granular. For example, you can specify an alert to occur when CPU utilization is greater than 50% across your whole observed stack of infrastructure and services. You can configure this type of alert to be more or less fine-tuned. For example, any time that CPU utilization is greater than 50% for a specific number of minutes, hours, or days. Or any time it occurs at all. Use an average or any single measure value of the metric.
When triggered, the alert can be sent to any integrated notification channel, such as PagerDuty or Slack.
You can also enable Sysdig Capture mode, which saves the state and forensic data when an event occurs. In that mode, you can use the data in postmortem and follow-up reviews outside of production.
Underneath deployed services, all kinds of events happen that the services are unaware of. For example, the recreation or movement of pods, or readiness probes that do not respond. Metrics tell a story of what’s happening in one small, quantifiable area. Events tell a story about all of the pieces that move as a result of external and internal forces, such as resource contention, network congestion, or failure. You can configure event alerts to look for events with certain tags or sources; this helps you pinpoint common events, such as resources or probes that fail, or pods that move to other hosts. These events create evidence of a current or future problem, in which case, you can create event alerts accordingly that match the criticality and alertness levels. Similar to other alerts, you can turn on the proper alert notification channels and capture when this type of alert occurs.
Anomaly detection is a powerful tool to alert you of deviations from observed norms. By utilizing observability algorithms to establish expected states of a service and infrastructure, you can generate an alert of configurable priority when an observation occurs outside of establish normal ranges (configurable when you setup the alert). Similar to other alerts, you can turn on the proper alert notification channels and capture when this type of alert occurs.
Observability is a critical foundation upon which a reliable service is built upon. In this tutorial, you set up a monitoring agent to observe sample activity within a Kubernetes cluster running in a public cloud, reviewed the data via dashboards, and learned about several types of helpful monitoring alerts.
IBM Cloud Monitoring provides powerful tools and visualization to build observability for your services and infrastructure, from initial deployment to scaled production workloads. It also provides built-in observability and alerting for measures, events, and complex anomaly detection. Learn more about the features available with IBM Cloud Monitoring by following the Getting started tutorial and reading the recent blog post about integrated Sysdig Secure features.