Big SQL Automatic Catalog Synchronization (Part 3 - Problem Determination)

Introduction
This blog is part of a series outlining all you need to know to start working with Big SQL’s Automatic Catalog Synchronization (Auto-Sync). Part 1 of the series provided an introduction to Auto-Sync, discussing its significance, the problem it addresses and how it can be enabled/disabled in Ambari. Part 2 presented a high-level view of the Auto-Sync architecture and provided details on the feature’s main configuration parameters.

In this third and final blog of the series, we’ll take a look at problem determination and explain what you can do if experiencing issues related to Big SQL’s Auto-Sync feature.

AutoHcatSync_Arch — Fig.1 – Big SQL Auto-Sync Architecture

Problem Determination
If you are experiencing unexpected behavior that you suspect may be related to the Big SQL Auto-Sync feature, there are a couple of places you should look first:

Note: Big SQL 5.0.1’s Auto-Sync Error Handling
Big SQL 5.0.1 includes an update to how Auto-Sync handles errors encountered during DDL event synchronization. In Big SQL 5.0.1, a synchronization failure encountered by Auto-Sync will, like previous versions of Big SQL, have all the relevant information written to bigsql.log, however, now the associated event-files will be moved out of the events-directory and into an “errors” sub-directory, referred to as the “errors-directory” (by default: /user/bigsql/sync/errors). This change in behaviour results in a number of significant benefits to Big SQL users. For more information on this you can take a look at one of my other blog post: Big SQL Automatic Catalog Synchronization – Error Handling.

1. The Big SQL Log File – bigsql.log
One place to look is the bigsql.log file (by default: /var/ibm/bigsql/logs/bigsql.log). This is where all debug and error messages are written by Auto-Sync during the processing of DDL event-files. If Big SQL encountered a problem while processing an event-file and synchronizing the associated DDL event, there will be an ERROR entry in bigsql.log containing detailed and specific information related to that particular error.

Note: The bigsql.log file may “rollover“. That is, the Auto-Sync log messages you are looking for may be in a ‘bigsql.log.n‘ log file (eg: bigsql.log.1) rather than in bigsql.log.

Tip: Enable DEBUG Logging
With debug logging enabled, more information will be written to the bigsql.log file, making it easier to troubleshoot an Auto-Sync issue.

To enable debug logging, do the following:

Append (or uncomment) the following in the head node’s $BIGSQL_HOME/conf/log4j.properties file:
log4j.logger.com.ibm.biginsights.biga=DEBUG
log4j.logger.com.ibm.biginsights.bigsql=DEBUG
log4j.logger.com.ibm.biginsights.catalog=DEBUG
Save the file
Restart Big SQL
Wait for any event-files to be re-processed by Auto-Sync

You will then get all relevant DEBUG information written to the bigsql.log file.

Note: Don’t forget to revert to the original values in the log4j.properties file when finished troubleshooting.

2. The Events-Directory
Another place to look is in the events-directory (by default: /user/bigsql/sync). If Auto-Sync encounters a synchronization error, associated event-files are left in the events-directory to be re-processed by Auto-Sync once the underlying issue has been resolved. Therefore, the presence of such files indicate a potential Auto-Sync problem that should be investigated. You can simply cat these event-files in order to extract the relevant information related to the DDL event that failed to synchronize, as shown in Figure 2, and use this to obtain further details related to the error in the bigsql.log file. This is the approach taken in the example outlined below.

3. Problem Resolution
In the majority of cases this will provide the necessary information for a Big SQL user to determine the root cause of the synchronization issue and set about resolving it. Once resolved, any related event-files remaining in the events-directory will be successfully processed by Auto-Sync and the Big SQL catalogs will be synchronized as expected.

Note: Delay in Processing New event-files
If Auto-Sync is already busy processing DDL events and synchronizing the Big SQL catalog, there may be a slight lag in the processing of new event-files written to the events-directory. This is especially true where there were a large number of event-files in the events-directory to begin with. This behaviour can make it appear as though there is a problem and Auto-Sync isn’t doing its job, however, under these circumstances synchronization has just been delayed and all new event-files will be processed during the next execution of Auto-Sync.

Example
Say we have an event-file, DDL-20170823033725787-f05de07d-6aa0-4732-a2c8-c71cef3eaac1.json, in the events-directory, as shown in Figure 2. This event-file is the result of a CREATE table DDL statement executed in Hive.

Fig.2 – Auto-Sync event-file in events-directory

If, when Auto-Sync executes next, this DDL event fails to synchronize with the Big SQL catalog, the relevant messages will be written to bigsql.log, as shown in Figure 3, and the associated event-file, DDL-20170823033725787-f05de07d-6aa0-4732-a2c8-c71cef3eaac1.json, will be left in the events-directory for re-processing when Auto-Sync executes next.

Fig.3 – Auto-Sync event-file left in events-directory & bigsql.log entry

The DDL event will be successfully synchronized and the associated event-file removed when the underlying issues has been resolved by the user.

Summary
This was the third and final blog in a series outlining all you need to know to start working with Big SQL’s Automatic Catalog Synchronization (Auto-Sync), the Big SQL feature providing automatic synchronization of metadata between the Hive metastore and the Big SQL catalog.

Throughout the series we looked at various aspects of Auto-Sync. In part 1 we considered the significance of the feature, we outlined the problem it addresses and saw how to enable/disabe Auto-Sync via the Ambari interface. In part 2 of the series we presented a high-level view of the Auto-Sync architecture. We saw how JSON-formatted event-files are written to the events-directory on HDFS (by default: /user/bigsql/sync) and processed to ensure that the Big SQL catalog and Hive metastore stay synchronized at all times. In part 2 we also presented the main configuration parameters available for controlling Big SQL’s Auto-Sync behaviour.

Finally, in this blog we closed out the series by taking a look at problem determination as it relates to the Auto-Sync feature and explained what you can do if experiencing issues related to Big SQL’s Auto-Sync.

Additional Information

IBM Support

Tips

Big SQL Automatic Catalog Synchronization (Part 3 - Problem Determination) - Hadoop Dev

Technical Blog Post

Abstract

Body

UID

Share your feedback

Need support?