IBM Support

Diagnosing errors in Db2 Big SQL (LOG_ENTRY function)

Technical Blog Post


Abstract

You can use the LOG_ENTRY table function to help diagnose errors in Db2 Big SQL.

Body

When an SQL statement fails in Db2 Big SQL, the returned error message might contain a log_entry_identifier that you can use to find more details about the error from a specific diagnostic log file. It is sometimes challenging to find and use the log files to understand what is causing the underlying error.

A table function (SYSHADOOP.LOG_ENTRY) is available to help you retrieve Db2 Big SQL-specific diagnostic log information. We will use examples to illustrate the types of error that might occur, and how you can use the LOG_ENTRY function to more easily troubleshoot these errors.

Log files and log-entry-identifier

Several different log files are used for problem determination in Db2 Big SQL. In most configurations, the log files are located in one of two places:

  • $BIGSQL_DIST_VAR/logs
  • $BIGSQL_DIST_VAR/diag

The $BIGSQL_DIST_VAR variable is generally set to /var/ibm/bigsql.

It is important to remember that these log files are “rotated”, which means that when a log file reaches a defined size it is renamed and a new log file is created. This log file size limit is configurable and can be changed in the $BIGSQL_HOME/bigsql/conf/*log4j* configuration files.

A log-entry-identifier is a three-part name of the form LLL-NNN-XXYYZZ.

For example, given a cluster that has the following nodes:

      ======================
      0  host1.foobar.com  0
      1  host2.foobar.com  0
      2  host3.foobar.com  0
      ======================

The first field in each record represents the node number and the second field represents the host name.

  • LLL identifies the diagnostic log file that contains the error:
    • BSL: The Db2 Big SQL log file, bigsql.log, is where most of the diagnostic information for Db2 Big SQL runtime execution is written ($BIGSQL_DIST_VAR/logs/bigsql.log).
    • SCL: The scheduler log file, bigsql-sched.log, is where the diagnostic information related to the Db2 Big SQL scheduler is written. It exists only on the node on which the scheduler is running, which is normally the Db2 Big SQL head node ($BIGSQL_DIST_VAR/logs/bigsql-sched.log).
    • DDL or DB2: The diagnostic information for these identifiers can be found in the $BIGSQL_DIST_VAR/diag directory in most configurations, but the location is configurable.
  • NNN identifies the Db2 Big SQL cluster node from which the error originated. To map this value to the name of the host where the log file resides, look at the /home/bigsql/sqllib/db2nodes.cfg file.
  • XXXYYYZZZ represents a unique message ID that is used to search the log file.

If the LLL portion of the log-entry-identifier is 'DDL' or 'DB2', the error was detected by the Db2 Big SQL main engine processor. In this case, the SYSHADOOP.LOG_ENTRY function cannot be used to get additional information about the error.

You can retrieve additional information about the following Db2 Big SQL error messages from the Db2 Big SQL log files:

  • SQL5104
  • SQL5105
  • SQL5111
  • SQL5197
  • SQL5198
  • SQL20557

Examples

  • Example 1: If for some reason, a file that is associated with a table has been removed from HDFS, you might see the following error when running a SELECT statement against that table.  Using the LOG_ENTRY function, we can see that the error is caused by a missing file. Note: Truncation warnings have been removed.
   db2 "select * from mytest"
   C1         
   -----------
   SQL5105N  The statement failed because a Big SQL component encountered an
   error.  Component receiving the error: "BigSQL IO".  Component returning the
   error: "UNKNOWN".  Log entry identifier: "[BSL-2-dae7e113]".  Reason: "File
   does not exist: /warehous".  SQLSTATE=58040
   db2 "select varchar(line, 200) from table(syshadoop.log_entry('BSL-2-dae7e113', 0, 10))"
1                                                                                                                                                                                             
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  2021-04-16 06:38:52,820 ERROR com.ibm.biginsights.bigsql.dfsrw.reader.DfsBaseReader   [Master-2-S:32.1001.1.0.0.3285] : [BSL-2-dae7e113] Exception raised by Reader at node: 2 Scan ID: S:32.1001.1.0.0.32
Exception Label: UNMAPPED(java.io.FileNotFoundException: File does not exist: /warehouse/tablespace/external/hive/bigsql.db/MYTEST/i_1618580106562_-647776100_20210416063524760_2.0                     
    at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:86)                                                                                                                         
    at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:76)                                                                                                                         
    at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:156)                                                                                      
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1993)                                                                                                       
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:742)                                                                                              
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:448)                                             
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)                                                   
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)                                                                                                  
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)                                                                                                                                                
  11 record(s) selected with 1 warning messages printed.
  • Example 2:  If for some reason, Hive is not active on the cluster, you might see the following error when running a SELECT statement.  Using the LOG_ENTRY function, we can see that the error is caused by an inability to access the Hive metastore. Note: Truncation warnings have been removed.
   db2 "select * from mytest"
   SQL5105N  The statement failed because a Big SQL component encountered an
   error.  Component receiving the error: "SCHEDULER".  Component returning the
   error: "FRONT-END".  Log entry identifier: "[SCL-0-d72c4c88]".  Reason: "".
                                         
   db2 "select varchar(line, 100) from table(syshadoop.log_entry('SCL-0-d72c4c88', 0, 10))"
   1
   --------------------------------------------------------------------------------------------------
     2021-04-15 13:15:07,912 ERROR com.ibm.biginsights.bigsql.scheduler.server.cache.DescriptorTableCache

 
com.thirdparty.cimp.catalog.TableLoadingException: Failed to load metadata for table: bigsql.mytest.
         at com.thirdparty.cimp.catalog.TableLoader.load(TableLoader.java:107)                              
         at com.thirdparty.cimp.catalog.TableLoadingMgr$3.call(TableLoadingMgr.java:296)                    
         at com.thirdparty.cimp.catalog.TableLoadingMgr$3.call(TableLoadingMgr.java:267)                    
         at java.util.concurrent.FutureTask.run(FutureTask.java:277)                                        
         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1160)                 
         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)                 
         at java.lang.Thread.run(Thread.java:820)                                                           
Caused by: MetaException(message:Could not connect to meta store using any of the URIs provided. Mos
         at org.apache.thrift.transport.TSocket.open(TSocket.java:226)                                      
       11 record(s) selected with 2 warning messages printed.
  • Example 3:  If for some reason, the Db2 Big SQL scheduler has stopped running, an error is returned when an SQL statement is run.
   db2 "select * from t1"

   SQL5197N  The statement failed because of a communication error with a Big SQL component. 
   Big SQL component name: "SCHEDULER". 
   Reason code: "1". Log entry identifier: 
   "DB2".  
   SQLSTATE=57066 
       Additional information about this error can be returned by using the Db2 Big SQL command line processor (CLP): 

   db2 "? SQL5197"
   SQL5197N  The statement failed because of a communication error with a
   Big SQL component. Db2 Big SQL component name: "<component-name>".
   Reason code: "<reason-code>". Log entry identifier:
     "<log-entry-identifier>".
    Explanation:
    The reason code indicates the type of problem:
    1  Connection refused.
    2  Connection timed out.
    3  A row being inserted into or read from an external table exceeds
       the size of an external table communication buffer. Reduce the
       amount of data being inserted into or read from the external
       table.

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm16259853