When an SQL statement fails in Big SQL, the returned error message might contain a log_entry_identifier that you can use to find more details about the error from a specific diagnostic log file. It is sometimes challenging to find and use the log files to understand what is causing the underlying error.

A table function (SYSHADOOP.LOG_ENTRY) is available to help you retrieve this key information. Let’s use an example to illustrate the types of error that might occur, and how you can use the SYSHADOOP.LOG_ENTRY function to more easily troubleshoot these errors.

Example 1: Suppose we have two different tables that reference the same file; the ALLOW_NULL table allows null values and the DISALLOW_NULL table does not. Now suppose that we insert a null value into the ALLOW_NULL table and then try to select the data from the DISALLOW_NULL table. This results in error SQL5104N, which returns a log-entry-identifier:

 

create hadoop table allow_null (
  c1 int, c2 int)
  location '/user/bigsql/allow_null'
DB20000I  The SQL command completed successfully.

create hadoop table disallow_null (
  c1 int, c2 int not null)
  location '/user/bigsql/allow_null'
DB20000I  The SQL command completed successfully.

insert into allow_null values (1,1),(3,null)
DB20000I  The SQL command completed successfully.

select * from allow_null

C1          C2
----------- -----------
          1           1
          3           -
  2 record(s) selected.


select * from disallow_null

C1          C2
----------- -----------
SQL5104N  The query failed because the data in HDFS cannot be mapped to the Hadoop table definition.  Hadoop table name: "DISALLOW_NULL".  Column index: "1".  Log entry identifier: "[NRL-003-231e58038]".

You can find additional information about this error in the log file that is identified by the log-entry-identifier value NRL-003-231d120e0. First, you would need to locate the appropriate log file and then you would need to search the file for the entry that corresponds to this particular error. A much better way is to use the SYSHADOOP.LOG_ENTRY function.

For example:

 

select cast (line as varchar(200))
  from table (syshadoop.log_entry('NRL-003-231e58038'))
  where line like '%5104%'
1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
E0212 10:36:48.568426 13280 bi-dfs-reader.cc:1142] [NRL-003-231e58038] SQL CODE -5104: Found NULL value for a not-null column. Counter= 1 Index= 1 Type =496 Nullable =0 Length= 4 (return code: -1)
E0212 10:36:48.568426 13280 bi-dfs-reader.cc:1142] [NRL-003-231e58038] SQL CODE -5104: Found NULL value for a not-null column. Counter= 1 Index= 1 Type =496 Nullable =0 Length= 4 (return code: -1)
E0212 09:51:32.259104 20434 bi-dfs-reader.cc:1142] [NRL-003-231bc0da2] SQL CODE -5104: Found NULL value for a not-null column. Counter= 1 Index= 1 Type =496 Nullable =0 Length= 4 (return code: -1)
3 record(s) selected.

Based on this additional information, we can see that a null value was found for a column that does not allow null values.

Log files and log-entry-identifier

Several different log files are used for problem determination in Big SQL. In most configurations, the log files are located in one of two places:

  • $BIGSQL_DIST_VAR/logs
  • $BIGSQL_DIST_VAR/diag

The $BIGSQL_DIST_VAR variable is generally set to /var/ibm/bigsql.

It is important to remember that these log files are “rotated”, which means that when a log file reaches a defined size it is renamed and a new log file is created. This log file size limit is configurable and can be changed in the $BIGSQL_HOME/bigsql/conf/*log4j* configuration files.

A log-entry-identifier is a three-part name of the form LLL-NNN-XXYYZZ.

  • LLL identifies which diagnostic log file contains the error:
    • BSL: The Big SQL log file, bigsql.log, is where most of the diagnostic information for Big SQL runtime execution is written ($BIGSQL_DIST_VAR/logs/bigsql.log).
    • NRL: The native reader log file is where the diagnostic information related to the C++ reader is written (/$BIGSQL_DIST_VAR/logs/ndfsio*.log).
    • SCL: The scheduler log file, bigsql-sched.log, is where the diagnostic information related to the Big SQL scheduler is written. It exists only on the node on which the scheduler is running, which is normally the Big SQL head node ($BIGSQL_DIST_VAR/logs/bigsql-sched.log).
    • DDL or DB2: The diagnostic information for these identifiers can be found in the $BIGSQL_DIST_VAR/diag directory in most configurations, but the location is configurable.
  • NNN identifies the Big SQL cluster node on which the error originated. To map this value to the name of the host where the log file resides, look at the /home/bigsql/sqllib/db2nodes.cfg file.
  • For example:

     
     
    =========================
    0  host1.foobar.com  0
    1  host2.foobar.com  0
    2  host3.foobar.com  0
    =========================
    
    

    The first field in each record represents the node number and the second field represents the host name.

  • XXXYYYZZZ represents a unique message ID that is used to search the log file.

If the LLL portion of the log-entry-identifier is “DDL” or “DB2”, the error was detected by the Big SQL main engine processor. In this case, the SYSHADOOP.LOG_ENTRY function cannot be used to get additional information about the error. For example, if for some reason the Big SQL scheduler has stopped running, an error is returned when an SQL statement is executed:

 

select * from t1
SQL5197N  The statement failed because of a communication error with a Big SQL component. Big SQL component name: "SCHEDULER". Reason code: "1". Log entry identifier: "DB2".  SQLSTATE=57066

In this case, additional information about the error message is logged in the DB2 log file on the head node (/var/ibm/bigsql/diag/DIAG0000/db2diag.0.log).

Searching this file for “SQL05197” reveals the following information:

 

2017-02-12-13.24.25.403566-480 I10020655E983         LEVEL: Error
PID     : 18007                TID : 139921314539264 PROC : db2sysc 0
INSTANCE: bigsql               NODE : 000            DB   : BIGSQL
APPHDL  : 0-6580               APPID: *N0.bigsql.170212212421
AUTHID  : BIGSQL               HOSTNAME: paged1.fyre.ibm.com
EDUID   : 103                  EDUNAME: db2agent (BIGSQL) 0
FUNCTION: DB2 UDB, base sys utilities, sqeBigSqlSchedulerInternal::requestScanMetadata, probe:2803
MESSAGE : ZRC=0xFFFFEBB3=-5197
          SQL5197N  The statement failed because of a communication error with a Big SQL component. Big SQL component name: "". Reason code: "". Log entry identifier: "".

DATA #1 : String, 140 bytes
Transport Exception occurred. The BigSql Scheduler service may not be running or the scheduler client request timeout may not be sufficient.

As can be seen from the diagnostic information, the error was generated because the Big SQL scheduler service was not running.

You can retrieve additional information about the following Big SQL error messages from the Big SQL log files:

  • SQL5104
  • SQL5105
  • SQL5111
  • SQL5197
  • SQL5198
  • SQL20557

3 comments on"Diagnosing errors in Big SQL"

  1. Proquotient June 19, 2017

    There can be numerous errors popping up when you make a transition from SQL to BigSQL. This Guide mentions some of the common errors and steps to overcome the error.

  2. […] For reference, see also some more information about Diagnosing Big SQL Errors […]

  3. […] 要获得相关参考资料,另请参阅有关诊断 Big SQL 错误的更多信息。 […]

Join The Discussion

Your email address will not be published. Required fields are marked *