Taxonomy Icon

Systems

Prerequisites

Monitoring and alerting server availability incidents is one of the primary responsibilities of system administrators and development IT teams for managing server infrastructure. The ping utility in LinuxB. can poll any IT equipment [which support Internet Control Message Protocol (ICMP) packets] with a valid Internet Protocol (IP) address to report its current availability status. This utility can be used to create automation scripts that can help system administrators and development IT teams.

Here, the script provided for availability monitoring and alerting is created using Hypertext Preprocessor (PHP) scripting language and can be run from a server. This script will poll or ping a set of IP addresses at regular intervals (say 15/30/60 minutes as required). These ping responses can be captured in a database (DB). In the event of a ping failure, the scenario has to be notified as an email, SMS, or a slack message to the administrators. To successfully enable this alerting, you’ll need:

  • A Linux server, a virtual machine (VM), docker
  • Latest version of PHP installed
  • A database instance (for example, MariaDB)
  • A database client

The polling frequency or a list of recipients can be changed at any point in time quite easily.

Understanding the custom script based alerting solution

The key components involved in this solution or method is an alerting server which can be a physical server, VM or docker. This server runs the alerting script as a cron-job (say every 15 minutes). Refer to the block diagram in Figure 1 for details. The list of target resources with IP address, name, type and administrator details can be maintained as a local array within the script, in a file, or in a DB table that will be an input to the alerting script. This script then polls or pings (using the Linux OS command: ping) the target IT resources that support ICMP packets every 15 minutes and reports their status. The status is marked as Pinging if there is a response or as Not pinging if there is no response.

Figure 1: High-level block diagram describing the proposed solution

Creating the ping status check table

The ping status is captured in a DB table as events for each IP address with a self generated auto- incremental event ID, IP address, server name (optional), server type (optional), ping status, email sent status checking parity, a count of the occurrences of the event, last or recent occurrence, and first occurrence of the event.

A ping status check table with the following columns as shown in Figure 2 is recommended. This can also be customized as needed.

Figure 2. ping_status_check table description diagram

Understanding the workflow of the alerting script

During each interval (say 15 minutes) for an IP address, a new occurrence of an event is inserted as a new row. In case of a repeated occurrence, the immediate previous event row of the same IP address is updated with an incremented event count and recent time-stamp.

When there is a ping failure, the event is notified as an SMS or email (using the PHP mail function) or as a slack (Incoming webhooks App) message to the respective system administrator of the IP address and the mail check parity is set to Yes. If there is a next consecutive occurrence of a ping failure, it is assumed that it is associated with the same scenario. Therefore, for this case as the mail check parity is already marked as Yes, the event count is just incremented. A redundant message will not be triggered. For the next occurrence of a successful ping of the same IP address, a message is triggered as an information to the system administrators that the system is back. Refer to the flowchart in Figure 3 for a better understanding.

Figure 3. Flowchart describing the workflow of the alerting script

Understanding the alerting script

Refer to the following PHP code to extract and process the server availability status. Comments are added to make the code self-explanatory.


<?php
$server_ipadd=array("X.X.X.X","Y.Y.Y.Y","Z.Z.Z.Z");
#This is a local array, use php functions to read from a file or DB
$record_time=date("Y-m-d H:i:s",time()); ping_list($server_ipadd_A,$server_hostname);
#function
ping_list($server_ipadd,$hostname) {
include '<DB-connection-code-script>';
#To connect to alerting DB
global $record_time; for($i=0;$i<sizeof($server_ipadd);$i++) {
$up = ping($server_ipadd[$i]);
#ping() is a user defined function using 'ping' utility of linux os
$select_prev_status_query = "select id,status,mailchk,event_count from ping_status_chk where privateip='$server_ipadd[$i]' order by time_stamp desc limit 1";
$result=$conn->query($select_prev_status_query); foreach($result as $row) {
$id=trim($row['id']);
$prev_status=trim($row['status']);
$event_count=trim($row['event_count']);
$prev_mailchk=trim($row['mailchk']);
}
if ($up == NULL) {
   #IP address is currently not pinging
   if($prev_status == 'Not Pinging') {
      #IP address was not pinging in previous occurrence of event also
      if($prev_mailchk == 'Y') {
          #Alert was already triggered
          $event_count = $event_count + 1;
          $update_query = "update ping_status_chk SET event_count = '$event_count' where id = '$id'";
          $result  = $conn->query($update_query);
      }
      elseif($prev_mailchk == 'NA') {
          #Alert was not already triggered; so send now
          $message="ALERT: $server_ipadd[$i] [$hostname[$i] is not reachable/down. Please do the needful";
          $body="*****Please do not reply back*****"; mail("<email-id>",$subject,$body);
          $cmd = "curl -X POST -H 'Content-type: application/json' --data '{\"text\":\"ALERT: $message\"}' <your-incoming-   webhooks-configuration-url>";
          $tmp = shell_exec($cmd);
          $event_count = $event_count + 1;
          $update_query11 = "update ping_status_chk SET mailchk = 'Y',event_count = '$event_count' where id = '$id'";
          $u11result  = $conn->query($update_query11);
      }
   }
    elseif($prev_status == 'Pinging') {
       #IP address was pinging in previous occurrence of event, so trigger an alert
       $message="ALERT: $server_ipadd[$i] [$hostname[$i] is not reachable/down. Please do the needful";
       $body="*****Please do not reply back*****"; mail("<email-id>",$subject,$body);
       $cmd = "curl -X POST -H 'Content-type: application/json' --data '{\"text\":\"ALERT:
       $message\"}' <your-incoming-webhooks-configuration-url>";
   }  
       $tmp = shell_exec($cmd);
    else
       $event_count = 1;
    {
       $mailchk = 'Y';
       $sql_values[] = "('$server_ipadd[$i]','$hostname[$i]','$type','Not Pinging','$mailchk','$event_count','$record_time')";

       #IP address is found not to be pinging as first occurrence, so trigger an alert
       $message="ALERT: $server_ipadd[$i] [$hostname[$i] is not reachable/down. Please do the needful";
       $body="*****Please do not reply back*****"; mail("<email-id>",$subject,$body);
   }
       $cmd = "curl -X POST -H 'Content-type: application/json' --data '{\"text\":\"ALERT:
       $message\"}' <your-incoming-webhooks-configuration-url>";
}
       $tmp = shell_exec($cmd);
else
       $event_count = 1;
{
       $mailchk = 'Y';
       $sql_values[] = "('$server_ipadd[$i]','$hostname[$i]','$type','Not     Pinging','$mailchk','$event_count','$record_time')";
   #IP address is currently pinging
   if ($prev_status == 'Not Pinging') {
       #IP address was not pinging in previous occurrence of event, so trigger an information that it is back online
       $message="FYI: $server_ipadd[$i] [$hostname[$i] is back online now";
       $body="*****Please do not reply back*****"; mail("<email-id>",$subject,$body);
       $cmd = "curl -X POST -H 'Content-type: application/json' --data '{\"text\":\"ALERT:
       $message\"}' <your-incoming-webhooks-configuration-url>";
       $tmp = shell_exec($cmd);
       $event_count = 1;
       $mailchk = 'NA';
       $sql_values[] = "('$server_ipadd[$i]','$hostname[$i]',$type','Pinging','$mailchk', '$event_count','$record_time')";
       }
   elseif ($prev_status == 'Pinging') {
       #IP address was pinging in previous occurrence of event also, so just update the event_count
       $event_count = $event_count + 1;
       $update_query = "update ping_status_chk SET event_count = '$event_count' where id = '$id'";
       $u1result  = $conn->query($update_query);
    }
    else
    {
       #IP address is found to be pinging as first occurrence**
       $event_count = 1;
       $mailchk = 'NA';
       $sql_values[] = "('$server_ipadd[$i]','$hostname[$i]','$type','Pinging','$mailchk', '$event_count','$record_time')";
    }
}
#ending 'for' loop in next line
}
if(!empty($sql_values)){
    $sql_finalQuery = "insert into ping_status_chk(privateip,hostname,type,status,mailchk,event_count,time_stamp) values ".implode(", ", $sql_values);
    $result = $conn->query($sql_finalQuery);
    unset($sql_values);
    unset($sql_finalQuery);
    }
$conn = null;
#This disconnects from alerting DB
}

function ping($host) {
       exec(sprintf('ping -c 1 -W 5 %s', escapeshellarg($host)), $res, $rval); return $rval === 0;
       }
?>

Conclusion

In this article, you have learned how to plan and set up an alerting server for monitoring the availability of server resources in your environment. You can refer to this architecture and create a new one or modify this to suit your business requirements. You can also customize the given code to suit any other similar monitoring requirements in your environment.