Introduction
Error-logging is a facility using which an operating system module or an user application can log any detected errors. These messages are written in order to identify the failing component, associated reason for the same, and any additional information. This set of information is aimed to help understand the reason for the failure of a component or any unexpected behavior. However, it is important to note that you cannot solely rely on this because this is a first failure data capture mechanism only. For example, if a user realizes that a connection to a disk has failed with an error-log entry, then it indicates the reason why writes might be failing for an application. An error-log entry of low paging space is very common and this entry indicates the user to increase paging space, as otherwise, the system might behave unexpectedly.
Because error logging is a serviceability mechanism, you should place them wisely with the required information so that it can clearly indicate what is intended. At the same time, care should be taken so that there in no sudden flood of entries that can negatively impact the search of error entries.
Logging an entry
Users can log an error entry using the following two mechanisms:
- Using a function: From user applications, you can use the
errlog
function and from the kernel extensions, you can use theerrsave
function to log an error entry. Syntax:int errlog ( ErrorStructure, Length) void ∗ErrorStructure; unsigned int Length;
- Using a command: Using the
errorlogger
command, you can log an entry. Syntax:errlogger Message
Reading the logged entries
The framework provides an error report tool, errpt. This tool provides various ways of looking at a report and filter. You can find more information about this in the Resources section. As we have seen earlier, users can write anything using the error-log entries, including structures and data buffers, that can help them in debugging. However, the errpt tool can just dump the whole information in standard data types such as hexadecimal, American Standard Code for Information Interchange (ASCII) and so on. In the later sections, we would see how we can write “C” code to fetch the error-log entries and rebuild the dumped structures and buffers, and dump the data in a more meaningful way for efficient and effective debugging.
Basics to reading error-log entries
An error-log entry consists of various attributes and value pairs. Some of the attributes to list are error-log identifier, label, probable cause, detailed data and so on. The detailed data attribute is aimed to equip users to dump the required data for ease of servicing the failed component. However, if a user dumps the structures containing some vital information, the errpt tool cannot comprehend it and the onus is on the owner of the error-log entry to do the conversion from raw data to an easily understandable format by mapping it to the corresponding structures. To achieve the above stated, error-logging framework provides a set of application programming interfaces (APIs) and constructs.
The error-logging framework writes the entries in a sorted order based on time. These entries are written in the binary format. A defined structure and construct is required to read and drive meaning out of these. You can use the following approach to search and read the entries from the error-log file and then map the detailed data section to user defined structures and drive more meaning out of these error-log entries.
Finding the location of the error-logging file
Use the following command to get the location of the error-logging file.
#/usr/lib/errdemon ‑l
Error Log Attributes
‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑‑
Log File /var/adm/ras/errlog
Log Size 1048576 bytes
Memory Buffer Size 32768 bytes
Duplicate Removal true
Duplicate Interval 10000 milliseconds
Duplicate Error Maximum 1000
The file that is listed against the Log File
tag is the complete path to the error-logging file.
Functions to read error-log entries
errlog_open
: This API is used to open the log file to start reading the entries logged so far. Following is the syntax for the same. Syntax:Table 1: Parameter details of theint errlog_open(path, mode, magic, handle)
errlog_open
functionTable 2: Return value details of theArgument Type Remarks path char Contains the absolute path of error-log file. mode int Is the same as the modes used in the open-system subroutine. magic unsigned int Determines the version of the errlog_entry_t stucture to use. The value of this parameter must be set to LE_MAGIC. The sys/errlog.h file contains the definition of the LE_MAGIC that is being used for the current version of IBM ® AIX ® . handle errlog_handle_t Acts as a return value, and contains the handle of the opened error-log file if successful. errlog_open
functionReturn value Meaning 0 Successful. LE_ERR_INVARG A parameter passed was invalid. LE_ERR_NOFILE Error-log file does not exist. LE_ERR_NOMEM Could not allocate the required memory. LE_ERR_IO An I/O error had occurred. LE_ERR_INVFILE The file is not a valid error-log file. errlog_close
: This function is used to close the error-log file whose error-log handle is passed as an argument. This handle must be the same as the one returned by theerrlog_open
API.A return value of “0” indicates success in closing the error-log file. In case of an error,int errlog_close(handle) errlog_handle_t handle;
LE_ERR_INVARG
is returned indicating that the argument passed is invalid.errlog_find_first
: This subroutine finds the first entry that matches the given filtering criteria.Table 3: Parameter details of theint errlog_find_first(handle, filter, result)
errlog_find_first
functionTable 4: Return value details of theArgument Type Remarks handle errlog_handle_t This is the handle returned by the errlog_open
subroutine.filter errlog_match_t This defines the filter to be used to search the entries. result errlog_entry_t When an entry matching the filter is found, the memory area pointed by this parameter is filled with that error-log entry. errlog_find_first
functionReturn value Meaning 0 Successful. LE_ERR_INVARG A parameter passed was invalid. LE_ERR_DONE Reached the end of the error-log file while searching. In other words, no match is found after the previous invocation of this API. If, this was the first invocation, then it means that there are no entries matching the criteria. LE_ERR_NOMEM Could not allocate the required memory. LE_ERR_IO An I/O error had occurred. errlog_find_next
: Meaning of the parameters remains the same as that of theerrlog_find_first
API.Return value meaning remains the same as that of theint errlog_find_next(handle, result)
errlog_find_first
API.errlog_find_sequence
:Table 5: Parameter details of theint errlog_find_sequence(handle, sequence, result)
errlog_find_sequence
functionReturn value meaning remains the same as that of theArgument Type Remarks handle errlog_handle_t It is the handle returned by the errlog_open
subroutine.sequence int This parameter specifies the sequence number of the error-log entry. result errlog_entry_t * When an entry matching the filter is found, the memory area pointed by this parameter is filled with that error-log entry. errlog_find_first
API.errlog_set_direction
:Table 6: Parameter details of theint errlog_set_direction(handle, direction)
errlog_set_direction
functionA return value of “0” indicates success in setting the direction to search for the error-log file. In case of an error,Argument Type Remarks handle errlog_handle_t It is the handle returned by the errlog_open
subroutine.direction int This parameter specifies the direction in which you can search for the entries. Possible values include:
LE_FORWARD
: To search in forward direction
LE_REVERSE
: To search in reverse directionLE_ERR_INVARG
is returned indicating that the argument passed is invalid.
Structures used
In this section, we can look at the structures to be used while attempting to read the error-log entries and building the search/filter criteria.
- Structure for
errlog_entry
as defined in the /usr/include/sys/errlog.h file
When an entry is found matching the filter criteria, the following error-log entry is returned in a form in the following structure. Using the members of this structure, users can access all the details of the error-log entry.Most of the fields abovetypedef struct errlog_entry { unsigned int el_magic; unsigned int el_sequence; char el_label[LE_LABEL_MAX]; unsigned int el_timestamp; /∗ few of them skipped ∗/ char el_machineid[LE_MACHINE_ID_MAX]; char el_nodeid[LE_NODE_ID_MAX]; char el_class[LE_CLASS_MAX]; char el_type[LE_TYPE_MAX]; char el_resource[LE_RESOURCE_MAX]; char el_rclass[LE_RCLASS_MAX]; char el_rtype[LE_RTYPE_MAX]; /∗ few of them skipped ∗/ unsigned short el_detail_length; char el_detail_data[LE_DETAIL_MAX]; /∗ this is important ∗/ /∗few of them skipped ∗/ } errlog_entry_t;
el_detail_data
can be used for searching an entry in error logging.el_detail_data
stores the data that is passed with thestruct err_rec0
structure inerrlog
orerrsave
APIs. If we map to the structure that we used to write, we should be able to get the required data. - Structure used to specify the searching criteria
The above structure is used to specify the filtering/searching criteria in thetypedef struct errlog_match { unsigned int em_op; union { struct errlog_match ∗emu_left; unsigned int emu_field; } emu1; union { struct errlog_match ∗emu_right; unsigned int emu_intvalue; unsigned char ∗emu_strvalue; } emu2; } errlog_match_t;
errlog_find_first
API. In emu2 union, the two fields:emu_intvalue
andemu_strvalue
are used to specify the integral or string type values. Fields specified inemu_field
will pick the value from the error-log entry and apply the operator specified inemu_intvalue
oremu_strvalue
depending on the field type selected withemu_field
. sys/errlog.h does contain some predefined values to make these inner fields easily accessible, and you can refer to it for more details.
Building the search criteria
Search criteria could be visualized as binary tree of the form depicted below.
L2: Operator em_op / \
[ emu_left] emu_rightL1: operator1 operator2
[ em_op] em_op / \ / \
Leaf: emu_field emu_intvalue emu_field emu_strvalue
The following table specifies the type of operators that can be used at each level depicted in the above binary tree.
Table 1. Table 7: Various node levels and their significance
Node level | Remarks |
---|---|
Leaf level nodes | These nodes contain only the error-log field and value as operands to the L1 node operator. |
L1 level nodes | These nodes specify the relational operators such as greater than, equal to, less than, and so on. |
L2 level node | At this level, this node will have logical operators such as AND, OR, and so on. |
In summary, only relational operators can be used with the error-log entries and values. The result of these relational operators can be combined to form complex search criteria using logical operators. Let’s look at the various operators that can be used.
- Relational operators:
The following table shows some of the relational operators that can be used to build the searching/filtering criteria. These are specified in the
em_op
field oferrlg_match_t
. These operators work on the leaf nodes only. Table 8: Relational operators and their meaningOperator Meaning LE_OP_EQUAL Check if the left leaf node (error-log entry field value) is equal to the right leaf node value. LE_OP_NE Check if the left leaf node (error-log entry field value) is not equal to the right leaf node value. LE_OP_SUBSTR Check if the left leaf node (error-log entry field value) contains the substring specified in the right leaf node value. LE_OP_LT Check if the left leaf node (error-log entry field value) is less than the right leaf node. LE_OP_LE Check if the left leaf node (error-log entry field value) is less than or equal to the right leaf node value. LE_OP_GT Check if the left leaf node (error-log entry field value) is greater than the right leaf node value. LE_OP_GE Check if the left leaf node (error-log entry field value) is greater than or equal to the right leaf node value. - Logical operators:
The following set of logical operators work on non leaf nodes only.
Table 9: Logical operators and their meaning
Operator Meaning LE_OP_AND Applies the logical AND
operator on the left and the right nodes.LE_OP_OR Applies the logical OR
operator on the left and the right nodes.LE_OP_XOR Applies the logical XOR
operator on the left and the right nodes.LE_OP_NOT Applies the logical NOT
operator only on the left node.
You can use the following tags for specifying the error-log entry field that can be used as the left operand to relational operators.
Table 2. Table 10: Tags representing their corresponding error-log entry fields
em_field values | Meaning |
---|---|
LE_MATCH_SEQUENCE | To use the error-log entry’s Sequence field as the operand. |
LE_MATCH_LABEL | To use the error-log entry’s Label field as the operand. |
LE_MATCH_TIMESTAMP | To use the error-log entry’s Timestamp field as the operand. |
LE_MATCH_MACHINEID | To use the error-log entry’s MachineID field as the operand. |
LE_MATCH_NODEID | To use the error-log entry’s NodeID field as the operand. |
LE_MATCH_CLASS | To use the error-log entry’s Class field as the operand. |
LE_MATCH_TYPE | To use the error-log entry’s Type field as the operand. |
LE_MATCH_RESOURCE | To use the error-log entry’s Resource field as the operand. |
LE_MATCH_RCLASS | To use the error-log entry’s Rclass (resource class) field as the operand. |
LE_MATCH_RTYPE | To use the error-log entry’s Rtype (resource type) field as the operand. |
Example
The following high-level approach can be used to read the error-log entries.
- Open the error-log file.
- Build a filter to search the entries you are interested in.
- Search the error-log entry based on the filter that is built in the previous step. If found, it returns the error-log entry, else a failure code.
- After analyzing the required error-log entries, close the error-log file.
Entry logged:
errlogger "I am from IBM"
The “C” program to read the detailed data, “I am from IBM”:
#include <fcntl.h>
#include <stdio.h>
#include <sys/errlog.h>
main()
{
/∗ error log file handle ∗/
errlog_handle_t my_errlog_hndl;
/∗ mode to open the error file ∗/
int mode = O_RDONLY;
int magic = LE_MAGIC;
/∗ path of error log file ∗/
char path[]="/var/adm/ras/errlog";
int rc=0;
/∗ error log entry matching/finding criteria ∗/
errlog_match_t match_resource_name;
/∗ error log entry details of matched entry ∗/
errlog_entry_t matched_errlog_entry;
/∗ This example looks for entries logged by OPERATOR type resource ∗/
char resource_name[]="OPERATOR";
/∗ opening error log file ∗/
rc=errlog_open(path,mode,magic,&my_errlog_hndl);
if ( rc )
{
printf(" Failed to open error log file error : %d\n",rc);
exit(1);
}
/∗
building matching criteria
criteria is :
if, el_resource field in errlog_entry_t structure is equal to OPERATOR value
∗/
match_resource_name.em_op=LE_OP_EQUAL;
match_resource_name.emu1.emu_field=LE_MATCH_RESOURCE;
match_resource_name.emu2.emu_strvalue=resource_name;
/∗ find the first entry ∗/
rc=errlog_find_first(my_errlog_hndl,&match_resource_name,&matched_errlog_entry);
if ( rc == LE_ERR_DONE )
{
printf(" Did not find any entry matching the criteria.\n");
}
else if ( rc )
{
printf(" Failed to find error log entry : %d\n",rc);
}
else
{
/∗ keep looking for all entries , break when done or error occurs∗/
while( !rc )
{
/∗ print the detailed data ∗/
/∗ One can print other details of error log entry too ∗/
/∗ Even the detail_data pointer could be typecasted to actual
structure and print the values as structure fields
∗/
printf("error log entries detail data is : %s\n",
matched_errlog_entry.el_detail_data);
/∗ find next entries after first has been found ∗/
rc=errlog_find_next(my_errlog_hndl,&matched_errlog_entry);
}
if ( rc == LE_ERR_DONE )
{
printf(" No more entries found.\n");
}
else
{
printf(" Failed to find error log entry : %d\n",rc);
}
}
/∗ close the error log file ∗/
rc=errlog_close(my_errlog_hndl);
if ( rc )
printf(" Failed to close error log file error : %d\n",rc);
}
You can compile the program using the following command:
cc read_errlog_entries.c ‑lerrlog