Big SQL 5.0.1, includes a significant update to how the Automatic Catalog Synchronization (Auto-Sync) feature handles errors encountered during DDL event synchronization.
Big SQL Auto-Sync was introduced in v4.2 and enables Big SQL to automatically synchronize Hive metastore changes into the Big SQL catalog, so that, any DDL operations (CREATE, ALTER, DROP) resulting in an update to the metastore, will be automatically reflected in the Big SQL catalog. For more details see my previous blog series on Big SQL Automatic Catalog Synchronization.
In this blog we will provide the details on the latest update to Auto-Sync’s error handling, describe the problems it solves, outline the benefits to Big SQL users and also show a simple example of what the behaviour looks like in practice.
Auto-Sync Error Handling – Prior to Big SQL 5.0.1
In previous Big SQL releases, if an Auto-Sync synchronization error was encountered, all relevant information was written to the Big SQL log-file (bigsql.log), and the associated event-file was left in the events-directory (by default: /user/bigsql/sync) so that, when the issue causing the error was resolved, Auto-Sync, on its next invocation, would re-process the event-file and synchronize any associated DDL events. However, if the issue resulting in synchronization errors was not resolved, this approach would result in Auto-Sync attempting to re-process the same event-file(s) every ‘n‘ seconds, potentially impacting Auto-Sync’s performance and flooding bigsql.log with the same error messages every ‘n‘ seconds. (Where ‘n’ is the time, in seconds, Big SQL waits between processing event-files, set via the ‘bigsql.catalog.sync.sleep‘ parameter – see my previous blog on Auto-Sync Architecture for more details).
Auto-Sync Error Handling – Big SQL 5.0.1
As depicted in Figure 1, a synchronization failure encountered by Auto-Sync in Big SQL 5.0.1 will too have all the relevant information written to bigsql.log, however, now the associated event-file is moved out of the events-directory and into an “errors” sub-directory, referred to as the “errors-directory” throughout this blog (by default: /user/bigsql/sync/errors).
This minor change in Auto-Sync behaviour actually has a number of significant benefits to Big SQL users, as outlined below:
A more efficient Auto-Sync execution and a cleaner more readable bigsql.log
By moving any event-files associate with Auto-Sync synchronization failures out of the events-directory, Auto-Sync will no longer attempt to re-process event-files known to have issues until the user deems it appropriate to do so. This results in Auto-Sync running more efficiently and reduces the number of errors written to bigsql.log file to one single error.
Easier monitoring of Auto-Sync errors
It is now very easy (and also good practice) to monitor the errors-directory for any synchronization errors that may have occurred. If there are errors, more details can then be found in bigsql.log.
The ability to re-process any DDL error event-files on demand
When the issues causing synchronization errors have been addressed, and where synchronization of the related object(s) is still required, the associated event-files can be simply moved back into the events-directory (by default: /user/bigsql/sync) and Auto-Sync will synchronize the related DDL events.
Note: Pre-Existing “errors” File or Sub-directory in the Events-directory?
- If you already have an errors sub-directory in the events-directory and you encounter an Auto-Sync synchronization error, the event-file associated with this synchronization error will be moved into the pre-existing “errors” sub-directory.
- If you happen to already have a file named “errors” in the events-directory and you encounter an Auto-Sync synchronization error, the file will be renamed to “errors.bak“, an “errors” sub-directory will be created and the event-file associated with the synchronization error will be moved into the newly created errors-directory.
Say we have an event-file, DDL-20170823033725787-f05de07d-6aa0-4732-a2c8-c71cef3eaac1.json, in the events-directory, as shown in Figure 2. This event-file is the result of a CREATE table DDL statement executed in Hive.
If, when Auto-Sync executes next, this event fails to synchronize with the Big SQL catalog, the relevant messages will be written to bigsql.log and the event-file, DDL-20170823033725787-f05de07d-6aa0-4732-a2c8-c71cef3eaac1.json, will be moved to the Auto-Sync errors-directory (by default: /user/bigsql/sync/errors), as shown in Figure 3.
In this blog we introduced the latest update to how Big SQL Automatic Catalog Synchronization (Auto-Sync) handles errors encountered during DDL event synchronization. We saw the problems it solves, the benefits of this update to Big SQL users and presented a simple example of what the behaviour looks like in practice.
- Big SQL Automatic Catalog Synchronization (Part 1 – Introduction)
- Big SQL Automatic Catalog Synchronization (Part 2 – Architecture)
- Big SQL Automatic Catalog Synchronization (Part 3 – Problem Determination)
- Automatic Hive catalog syncing to the Big SQL catalog
- Accessing tables created in Hive and files added to HDFS from Big SQL
- Hive and Big SQL catalogs are inconsistent