Abstract

Many organizations have in place strategic enterprise monitoring and alerting frameworks such as IBM Tivoli Enterprise Monitoring, Novell Operations Center(TM) or BMC Patrol(TM). These frameworks provide centralized monitoring and operational support of systems throughout the enterprise. When delivering new systems into production, it is typically required that these systems must integrate with the enterprise monitoring capability.

IBM Open Platform for Apache Hadoop and Apache Spark provides rich monitoring capabilities for services and components via Ambari Alerting. Also, IBM Data Server Manager (DSM) which is part of the BigInsights value add module, provides additional monitoring and alerting capability for IBM Big SQL. Both Ambari and DSM provide the capability to forward alerts to administrators over SMTP (email) or SNMP traps. However, it is often required to conform to the specific API provided by the in-place enterprise monitoring framework for posting alerts rather than using SMTP or SNMP.

In this article, we’ll focus on how to leverage the Ambari custom alert dispatcher script capability to seamlessly integrate with enterprise monitoring frameworks. Additionally, we’ll setup some custom Ambari alerts to detect the state of the Big SQL service and its components.

Ambari alerting

Ambari monitors the health of the cluster and can alert the administrator to problems. Ambari includes a number of predefined alerts across various services in the cluster. There are three basic alert states, which are WARNING, CRITICAL and OK. Alerts are generated whenever the state of a monitored component changes. For example, if a Hbase regionserver aborts and goes down, then a CRITICAL alert is raised. Then when the administrator restarts the failed regionserver, an OK alert is generated. Both of these alerts are important to the centralized alerting framework. The OK alert closes the loop by providing a trigger for automatically clearing the alert from the management console.

Script based alert dispatcher

The Ambari script based alert dispatcher provides a facility to automatically invoke a custom “notification dispatch” script which is run every time an alert is fired by Ambari. This custom script can then invoke an API to post the alert event to the centralized enterprise management console. Ambari passes five parameters to the script including the alert definition name, the definition label, the service name, the alert state, and the alert text. You can read more above the specific details at this link: https://cwiki.apache.org/confluence/display/AMBARI/Creating+a+Script-based+Alert+Dispatcher

Steps to configure custom alerts for Big SQL

Before we configure the custom dispatcher script, we will setup three custom alert definitions for the Big SQL service.

(1) First, export the list of the current alert definitions for future reference.
curl -u admin:password -X GET -H "X-Requested-By: ambari" http://ambari-hostname:port/api/v1/clusters/cluster-name/alert_definitions

(2) Next we’ll create three new alert definitions for Big SQL. Ambari supports alerts based on various triggers including PORT, METRICS, SCRIPT, WEB. For this use case, detecting the basic alive or dead state of Big SQL is deemed sufficient. Therefore, we will create custom alert definitions based on the PORT trigger. Big SQL head and worker nodes listen by default on port 32051. Provides that the respective Big SQL node is up and alive, the node will be listening on this port. Also, on the Big SQL head node, there is an additional component, the scheduler which listens on port 7053. Therefore, we create the following new alert definitions:

For the Big SQL Head Node (headnode.json)
{
"AlertDefinition" : {
"cluster_name" : "prodcluster1",
"component_name" : "BIGSQL_HEAD",
"description" : "This host-level alert is triggered if the BigSQL head processes, aka db2sysc, cannot be confirmed to be up and listening on the network for the configured critical threshold, given in seconds.",
"enabled" : true,
"ignore_host" : false,
"interval" : 3,
"label" : "BIGSQL Head",
"name" : "bigsql_head",
"scope" : "HOST",
"service_name" : "BIGSQL",
"source" : {
"default_port" : 32051,
"reporting" : {
"ok" : {
"text" : "TCP OK - {0:.3f}s response on port {1}"
},
"warning" : {
"text" : "TCP OK - {0:.3f}s response on port {1}",
"value" : 1.5
},
"critical" : {
"text" : "Connection failed: {0} to {1}:{2}",
"value" : 5.0
}
},
"type" : "PORT",
"uri" : "{{32051}}"
}
}
}

For the BigSQL Scheduler (scheduler.json):
{
"AlertDefinition" : {
"cluster_name" : "prodcluster1",
"component_name" : "BIGSQL_HEAD",
"description" : "This host-level alert is triggered if the BigSQL Scheduler head processes, aka scheduler, cannot be confirmed to be up and listening on the network for the configured critical threshold, given in seconds.",
"enabled" : true,
"ignore_host" : false,
"interval" : 3,
"label" : "BIGSQL Scheduler",
"name" : "bigsql_scheduler",
"scope" : "HOST",
"service_name" : "BIGSQL",
"source" : {
"default_port" : 7053,
"reporting" : {
"ok" : {
"text" : "TCP OK - {0:.3f}s response on port {1}"
},
"warning" : {
"text" : "TCP OK - {0:.3f}s response on port {1}",
"value" : 1.5
},
"critical" : {
"text" : "Connection failed: {0} to {1}:{2}",
"value" : 5.0
}
},
"type" : "PORT",
"uri" : "{{7053}}"
}
}
}

For the Big SQL Worker Node (worker.json):
{
"AlertDefinition" : {
"cluster_name" : "prodcluster1",
"component_name" : "BIGSQL_WORKER",
"description" : "This host-level alert is triggered if the BigSQL worker processes, aka db2sysc, cannot be confirmed to be up and listening on the network for the configured critical threshold, given in seconds.",
"enabled" : true,
"ignore_host" : false,
"interval" : 3,
"label" : "BIGSQL Worker",
"name" : "bigsql_worker",
"scope" : "HOST",
"service_name" : "BIGSQL",
"source" : {
"default_port" : 32051,
"reporting" : {
"ok" : {
"text" : "TCP OK - {0:.3f}s response on port {1}"
},
"warning" : {
"text" : "TCP OK - {0:.3f}s response on port {1}",
"value" : 1.5
},
"critical" : {
"text" : "Connection failed: {0} to {1}:{2}",
"value" : 5.0
}
},
"type" : "PORT",
"uri" : "{{32051}}"
}
}
}

Note, if there is a secondary headnode enabled on this cluster then alerts definitions should be posted for the secondary headnode and secondary scheduler, similar to those for the primary headnode above.

(3) Next, we post the new definitions from these three json payload files. That is, for each do the following:
curl -u admin:password -i -H 'X-Requested-By: ambari' -X POST -d @headnode.json http://ambari-hostname:port/api/v1/clusters/cluster-name/alert_definitions/

(4) Now, we can create the custom dispatcher script, for example:
/var/lib/ambari-server/resources/host_scripts/custom_alert.sh

Let’s assume that the central alerting system has provided a java executable “postemsg” (post event message) which monitored systems can run from the command line or a shell script to send alerts to the centralized alerting system. Therefore, we can wrap postemsg in the custom_alert.sh script and provide the required parameters including the five parameters provided in-turn by Ambari. The script would look something like:

#!/bin/bash
postMsgJar=/var/lib/ambari-server/resources/host_scripts/postemsg.jar
alertServer=noctest.svl.ibm.com
alertServerPort=9056
thisScript=$(basename $0)
thisLogFile=/var/log/$thisScript.log
thisHostName=$(hostname)

dt=$(date '+%d/%m/%Y_%H:%M:%S')
echo “$dt : Calling postmsg for parameter 1 = $1 parameter 2 = $2 , parameter 3 = $3 , parameter 4 = $4 , parameter 5 = $5” >> $thisLogFile

java -jar $postMsgJar $alertServer $alertServerPort \
“hostname=$thisHostName” \
“eventtime=$dt” \
“eventname=$2” \
“componentname=$3” \
“severity=$4” \
“eventdescription=$5” \
“etc, etc” >> $thisLogFile

retCode=$?

if [ $retCode -ne 0 ]
then
echo $dt : an error occurred when posting the Event Message >> $thisLogFile
fi
exit $retCode

(5) Add the following line to the /etc/ambari-server/conf/ambari.properties file
notification.dispatch.alert.script=/var/lib/ambari-server/resources/host_scripts/custom_alert.sh

(6) Restart ambari-server to pick up this new custom dispatcher script.

(7) Post the new dispatcher to use this alert_script

curl -i -u admin:password -H 'X-Requested-By: ambari' -X POST -d '
{
"AlertTarget": {
"name": "syslogger",
"description": "Syslog Target",
"notification_type": "ALERT_SCRIPT",
"global": true
}
}
' http://ambari-hostname:port/api/v1/alert_targets

(8) Perform some end to end tests. Kill a workernode and observe a CRITICAL alert being generated on the ambari-console followed shortly after by the alert being dispatched to the central alert monitoring system courtesy of the custom_alert.sh script. Next, restart the worker node and observe the OK alert being generated on the ambari console followed shortly afterwards by the alert being dispatched to the central monitoring system which then clears the associated CRITICAL alert from its console. Repeat this procedure for the headnode and scheduler components and finally do the same for some of the open source services such as Hbase and Hive.

(9) Additionally, the central alerting system may require a heartbeat from the BigInsights cluster. For example, it may be required to send a special heartbeat alert every 10 minutes to notify the central system that the Ambari cluster is alive. This needs to occur independently of the Ambari alerting service. Therefore, we schedule a script using cron to run every ten minutes on the host where the Ambari master resides. The script, will simply check the status of the ambari-server and report a CRITICAL alert if the ambari-server is down and an OK alert if the ambari-server is running. This script will call the dispatcher script directly and provide the five parameters normally supplied by Ambari. For example, the script might look like the following:

#!/bin/bash

dispatchScript=/var/lib/ambari-server/resources/host_scripts/custom_alert.sh
thisHostName=$(hostname)

ambari-server status | grep “Ambari Server running” > /dev/null
retCode=$?

if [ $retCode -ne 0 ]
then
$dispatchScript “hearbeat” “Ambari Server Heartbeat” “AMBARI_SERVER” “CRITICAL” “The Ambari-master is down on hostname $thisHostName”
else
$dispatchScript “hearbeat” “Ambari Server Heartbeat” “AMBARI_SERVER” “OK” “The Ambari-master is alive on hostname $thisHostName”
fi
exit $retCode

And we add the following crontab entry to schedule this heartbeat script:

##Crontab entry – run above script every 10 minutes
*/10 * * * * /var/lib/ambari-server/resources/host_scripts/ambari_hearbeat.sh

Further reading and related articles

More information on Ambari Alerting in the BigInsights Knowledge Center:
http://www.ibm.com/support/knowledgecenter/SSPT3X_4.2.0/com.ibm.swg.im.infosphere.biginsights.admin.doc/doc/admin_ambari_alert_mon.html

Apache.org wiki page on Creating a script based dispatcher:
https://cwiki.apache.org/confluence/display/AMBARI/Creating+a+Script-based+Alert+Dispatcher

2 comments on"Enterprise alerting for Big SQL using Ambari custom dispatcher"

  1. Hi. I’m trying to make a notification when ambari alerts come using script-based dispatch. But script is not invoked.

    Here is my steps:

    1. make a script ‘/var/lib/ambari-server/resources/scripts/custom_script.sh’
    2. add property to ‘etc/ambari-server/conf/ambari.properties’ with ‘notification.dispatch.alert.script=/var/lib/ambari-server/resources/scripts/custom_script.sh’

    3. ambari-server restart

    4. add alert target with

    curl -i -u admin:admin -H ‘X-Requested-By: ambari’ -X POST -d ‘{“AlertTarget”:{“name”:”testName”,”description”: “testDesc”,”notification_type”: “ALERT_SCRIPT”,”global”: true}}’ http://10.109.22.62:8080/api/v1/alert_targets

    5. ambari-aget server down and check ambari-agent heartbeat alert.

    6. ambari-agent heartbeat is down, but custom_script.sh not invoked

    is there any missing??

    Thanks.

    • Hi. Sorry for delayed response.

      Few things to check:

      1/ Does the custom_script.sh have execute permission set?

      2/ Is there any errors relating to the script-dispatcher around the time the alert should have fired… e.g. the following is observed when script does not have execute permission set:

      14 May 2017 16:51:43,442 WARN [script-dispatcher-1] AlertScriptDispatcher$AlertScriptRunnable:319 – Unable to dispatch ALERT_SCRIPT notification because /var/lib/ambari-server/resources/scripts/custom_script.sh terminated with exit code 126

      3/ Are the alerts being posted to the Ambari UI?

      Thanks.

Join The Discussion

Your email address will not be published. Required fields are marked *