IBM Support

Reducing Stale Alerts in Ambari - Hadoop Dev

Technical Blog Post


Abstract

Reducing Stale Alerts in Ambari - Hadoop Dev

Body

This blog provides you with some tips on how to reduce the frequency with which you receive notification about Ambari Server Stale Alerts throughout the day when running a problem-free cluster.

What is an Ambari Server Stale Alert?

An Ambari Server Stale Alerts is triggered if the server detects alerts which have not run in a timely manner. Because these alerts are always tagged as CRITICAL, you are always notified through email of an alert. This can become tedious, and you may want to reduce your email burden.

You can adjust the alert grace time and the intervals of all alerts, including the Ambari Server Stale Alerts. But you can’t simply ignore or disable the Ambari Server Stale Alerts. It’s important you check those services’ availability and performance.

Decreasing the notification frequency by increasing the alert_grace_time

Some hosts in a cluster may be required to run many alerts depending on the number of components installed. If the number of components is large, it is possible that alert jobs may miss their scheduled intervals (see /var/log/ambari-agent/ambari-agent.log below). The default alert_grace_period value is 1 second, which is rather aggressive. This alert_grace_period setting is configurable in Ambari 2.2.0. If the cluster is ambari 2.2.0 (IOP 4.2), you can increase the value to a higher value in /etc/ambari-agent/conf/ambari-agent.ini for those hosts having misfired alerts.

Decreasing the notification frequency by increasing the alerts intervals

If you are finding distracting or unhelpful the frequency with which you are notified of alerts, you can increase the interval amount.  You can choose to increase the value to any acceptable value based on the response time of your Hadoop distributed system.

There are two methods for decreasing the amount of alerts you see: you can reduce the frequency of seeing the Ambari Server Stale Alerts by increasing the interval of either the Ambari Server Stale Alerts itself or other alerts frequently showing up in the Ambari Server Stale Alerts. The default interval values are good for demo purpose, but in production clusters, the values should match the response time of your distributed clusters.

Increasing the interval of the Ambari Server Stale Alerts

To increase the interval of the Ambari Server Stale Alerts:

  1. Click the Alerts tab from the Ambari Web UI
  2. Click the Groups button to get the list of Alert Groups
  3. Select the AMBARI Default item, then click Ambari Server Alerts
  4. Click Edit
  5. Increase the interval value to your preferred amount, for example 15, for a 15 minute interval
  6. Click Save

Increase other alerts intervals

Get the Ambari Server Stale Alerts info from the /var/log/ambaris-server/ambari-alerts.log file by searching [CRITICAL] [AMBARI] [ambari_server_stale_alerts] (Ambari Server Alerts).  To modify the interval of the other Alerts:

  1. Click Alerts tab from the Ambari Web UI
  2. Either:
    • Click the Groups button to get the list of Alert Groups, then select appropriate group for an alert
      or
    • Search for an alert by its definition name
  3. Click Edit
  4. Increase the interval value to your preferred value
  5. Click save

If you have further questions, please call your support representative.

 

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm16260107