IBM Cloud App Management (ICAM) is a container-based platform to monitor the performance and availability of both traditional and modern microservices based business applications that are deployed on both public cloud and On Premises.

Monitoring Golden Signals for application resiliency

The four ‘Golden signals’ for the SRE discipline, Latency, Errors, Saturation and Traffic have become key indicators to monitor distributed systems. These metrics are closely related to the older methods, for example, USE metrics – Utilization, Saturation and Errors and RED metrics – Rate, Errors, and duration. Monitoring these Golden signals gives SREs visibility into the performance of the services and it helps to maintain high availability.

ICAM simplifies troubleshooting by providing visibility into both the microservices by using Golden Signals and also the traditional resources by using USE metrics. You use these signals as early warning signs to receive advanced knowledge of service impacts, thereby keeping your service downtime to a minimum. If the error rate or the latency exceeds the expected threshold, you receive notifications to address the issue before it impacts customers negatively.

Sre_signals

Solutions to your problems are one hop away

One of the big challenges that comes with microservices flexibility and growth, is often seen and shared in the form of a ‘Death Star’ pattern. This makes troubleshooting problems very tedious and complicated. ICAM solves this problem by providing a one-hop topology, where you can look at the immediate upstream and downstream dependencies that are just one hop away. You can look at the health of your dependencies in the topology to determine if your service is affected by a dependent service that is causing the bottleneck.

The timeline on the service page also provides visibility on the deployments and other events and this helps you to determine if a recent code push is causing an issue.

service_dependencies

Integrating with a myriad of external offerings

You can setup both incoming and outgoing integrations from external sources into ICAM. For example, you can setup integration with Jenkins projects to receive notifications about job status or deployments. Also, you can integrate with prometheus, Azure, new Relic and many more offerings to receive event notifications. You can receive incident details via outgoing integrations to Slack, ServiceNow, Github and more.

Monitoring the availability of business critical application

Don’t wait until your critical business application becomes customer-impacting that consumes your error budget. Instead, ensure that you monitor proactively, by using Synthetics testing to get notified ahead of time.

Reducing the Noise to the SRE

Receiving hundreds of alerts for a single underlying problem causes excessive noise and it reduces the precious time that is needed to focus on the problem to hand. ICAM helps aggregrate all the events that are tied to an application or cluster into one incident. This aggregation helps the SRE to focus on quickly restoring the service.

Resolving issues by historical knowledge

ICAM runbooks allow you to take action in order to resolve incidents, directly react to events, or perform scheduled or unscheduled changes in your data center.

Ensure you manage and monitor your infrastructure and service efficiently by using IBM Cloud App Management to deliver reliable services to your end users.


Writer: Meena Gopal
Editor: Carmel Burgess

Join The Discussion

Your email address will not be published. Required fields are marked *