This blog outlines one of the procedures to backup the Big SQL catalog while the database is activated. The restore is intended to be done only in case of emergencies. The steps outlined here are for restoration on the same cluster. This is just one of the Big SQL recovery options, more options can be found in Big SQL Best Practice and Guidelines – Recovery.

For cases where there is no Big SQL Disaster Recovery or High Availability implemented, it is recommended to take regular backups of the Big SQL database. The steps outlined here are for taking regular Big SQL online backups. The frequency of the backups is up to the administrator but try keeping at least 3 backups at any given point in time.

Online backups for Big SQL involve taking backups of the Big SQL head node (catalog node). Online backups of all the nodes is not supported. If there is data that need to be preserved from the worker nodes, consider using offline backup of the Big SQL database. Only performance data if applicable (such as data captured from event monitors) is stored on the worker nodes. Online backup of the head node will not backup this data. If point in time recovery is required, then offline backups should be chosen over online backups. However, online backups are convenient because the database does not need to be deactivated before taking an online backup. In case of failures, the latest online backup should be restored.

Important – Since Big SQL works with other components such as Hive, HDFS and HBase, backing up of these components is also required. It is recommended to take the backups of other components the same time that the backup is taken for the Big SQL catalog or the Big SQL catalog is restored. Some suggestions are given of how to backup and restore other components and links are given for convenience. However, please consult with other db vendor’s documentation to determine the most up to date instructions for other components outside of Big SQL.

There is a Big SQL Backup and Restore utility that can be used for online backup and restore on the same cluster available from Big SQL 5.0.2. Along with using this utility backups of the Hive meta-store and HDFS snapshots need to be taken using the following sequence of steps:


1. Create backup directories
mkdir /bigsql_backup/backup_offline
mkdir /bigsql_backup/backup_online
2. Take an initial offline backup of the Big SQL database on the Big SQL head node and turn on log archiving.  
bar.py -m backup --offline --logarchmeth -d /backup/bigsql_dbbackup/backup_offline
Note that by default logarchmeth is set to LOGRETAIN.  To use a actual log archive path the --logarchmeth parameter can be combined with the -w parameter.
3. Take an online backup of the Big SQL database on the Big SQL head node and prune unnecessary log files
bar.py -m backup --prune -d /backup/bigsql_dbbackup/backup_online
4. Take hive meta-store backups and HDFS snapshots(see section 3.3 and 3.4 below)
5. Repeat steps 3 and 4 as determined by the backup frequency
6. Restore the latest big sql backup in case of emergency
bar.py -m restore -d /backup/bigsql_dbbackup/backup_online
7. Take a new hive meta-store backup in case of emergency (section 4.6 below)

If not using the utility the following steps are required for taking an online backup:

1. Preparation for Backup

1.1 Create backup and log archive directories

When taking a backup, first choose paths that is different from the bigsql_db_path or bigsql_data_directories. The head node can be determined by looking at the ~/sqllib/db2nodes.cfg file, the host name with 0’s on either side is the head node in the cluster. Use the mkdir command to create the backup and archive log directories on the head node in the cluster as the bigsql user id:


mkdir /bigsql_backup/backup_offline
mkdir /bigsql_backup/backup_online
mkdir /bigsql_log/active
mkdir /bigsql_log/archive

1.2 Update Big SQL Configuration (optional)

The default log path is in /var. Often this path is very small. The log path can be altered as the bigsql database administrator. Using the bigsql id on the head node, issue the following commands:


db2 "update db cfg for bigsql using NEWLOGPATH ‘/bigsql_log/active’"

2. Turn on log archiving on the head node

By default, Big SQL uses circular logging. This means that after a specified number of log files have been written, log files start getting overwritten. Online backups require that archive logging be enabled. With archive logging log files are not overwritten. One of the side effects is that these log files can fill up the log path. Therefore, a log pruning strategy is required.
After turning on log archiving an offline backup is required. If HA is already configured on the system, then this step is not necessary. Issue the following commands which require DB2 administrator (DBADMIN) privileges as the bigsql user from the Big SQL head node:


db2 “update db cfg for bigsql member 0 using logarchmeth1 ‘DISK:/bigsql_log/archive’”  
--alternatively, logarchmeth1 can be set to LOGRETAIN instead of a different path:
db2 “update db cfg for bigsql member 0 using logarchmeth1 LOGRETAIN"  
db2 force applications all
db2 terminate
db2 deactivate db bigsql 
db2 “backup db bigsql on dbpartitionnums (0) to /bigsql_backup/backup_offline”
db2 activate db bigsql  -- and continue with day to day activity

3. Take an online backup

At the intended time interval, take an online backup and then continue with day to day activities. When taking backup of the Big SQL head node, a backup of the Hive meta-store is also recommended.

3.1 Backup the Big SQL catalog node

Since the backup is online, there can be inflight transactions while the backup is being taken. For example, there can be tables created while the backup image is being taken. The logs for these inflight transactions are stored in the backup image. Use the include logs option to keep the logs files in the backup image. On the Big SQL head node issue the following command to take an online backup:


db2 "backup db bigsql on dbpartitionnums (0) online to /bigsql_backup/backup_online include logs"

3.2 Prune log files

Since archive logging is enabled, after time the archive log path can become full. When HA is enabled since Big SQL 4.2.5 automatic log pruning is enabled by default, therefore this step is not required when HA is enabled. The prune history command can be used to delete archive log files on and before a specified timestamp. Note, if automatic backup and log maintenance is enabled via the following Big SQL db cfg parameter AUTO_DEL_REC_OBJ then the backup image itself can be deleted if the timestamp specified by the prune history command is the same as the timestamp of the backup image. Therefore, as a best practice, choose a timestamp for the prune history command prior to an existing online backup image. For example, if the backup image is: 20171015161107, then the prune command can be issued with the 2017101516 timestamp.


db2 prune history 2017101516 and delete

Running the prune command is not necessary after each online backup image is taken but it can help remove some unneeded log files that fill up the log path. If the log path has become full then take a new online backup using the command in the previous step and run the prune command with a timestamp prior to the backup image.

3.3 Take a Hive meta-store backup

More information on Hive Backup and restore can be found in HDP Documentation on backup. For convenience, some commands are given for the default hive mysql database.
First, login to Ambari UI, go to Hive –> Config –> Advanced, check for the property “Database Type” – make sure it is MySQL and get the database name e.g. HIVE. Login via shell to the meta-store host as root user and issue the following commands:


mkdir /hive_backup
mysqldump hive > /hive_backup/hive.sql
ls -ltr /hive_backup/hive.sql

3.4 Take HDFS/HBase Snapshots

HDFS and HBase snapshots can be taken at points in time, some tips are given in offline backup of the Big SQL database.

Repeat Steps 3 – 3.4

Repeat steps 3 to 3.4 depending on how frequently backups should be created.

4. Restore (ONLY in case of emergencies)

A restore operation should only be taken in case of an emergency for example the database is not accessible. If the database is not accessible, it is best to first open a support ticket before trying to restore to determine if this is the correct course of action. The last online backup should be chosen when restoring. The instructions below are for restoring the online backup. Instructions on restoring an offline backup are also available.
Big SQL does not support point in time recovery for online backups. After a restore operation is done, a rollforward to end of logs is required. After the rollforward operation is completed, do not restore the hive meta-store, instead take a new hive meta-store backup.

4.1 Disable automatic maintenance utilities

The administrative task scheduler (ATS) should be disabled prior to doing the restore and the rollforward operations. Big SQL uses ATS for automatic statistics collection and automatic syncing of the Big SQL catalog. After turning off ATS, restart the Big SQL service: On the head node as the bigsql user issue the following command and restart the Big SQL service:


db2set DB2_ATS_ENABLE DB2_ATS_ENABLE=NO
//restart Big SQL service

4.2 Determine the timestamp of the backup image

When the online backup is taken, a file with a timestamp is created on the Big SQL head node. Example of a valid timestamp is 20170927071532. Go to the backup directory specified in the backup step for example /backup/bigsql_dbbackup_online and determine the timestamp of the latest backup.

4.3 Restore the Big SQL catalog

Restore the Big SQL catalog with the timestamp noted in the previous step using the following commands as the bigsql user on the Big SQL head node:


db2 "restore db bigsql from /bigsql_backup/backup_online taken at 20171003122834 replace existing without prompting" 
//The following message is normal: “SQL2540W Restore is successful, however a warning "2539" was encountered
during Database Restore while processing in No Interrupt mode” 

4.4 Rollforward to end of logs

A rollforward operation through to the end of the logs is required after the restore is completed. Only after the rollforward command will the database be accessible. Issue the following command as the bigsql user on the Big SQL head node:


db2 "rollforward database bigsql to end of logs on dbpartitionnum (0) and stop"

4.5 Re-enable automatic maintenance utilities

Re-enable ATS after the restore is done, on the Big SQL head node as the bigsql user issue the following command:


db2set DB2_ATS_ENABLE DB2_ATS_ENABLE=YES
//restart Big SQL service

4.6 Backup the Hive meta-store and take HDFS/HBase Snapshots

Note that the Hive meta-store does not need to be restored after the Big SQL catalog node is restored. This is because online backups are not for point in time recovery. Instead of restoring the hive meta-store, take a new backup of the Hive meta-store. This backup can be used only if there is a failure from Hive. Refer to section 3.3 and 3.4 for details on taking Hive meta-store backup and HDFS snapshots.

Troubleshooting

If this error is encountered during rollforward, this means that some of the logs files needed for recovery are missing. Restore a previous online backup image in this case.
SQL1273N An operation reading the logs on database “BIGSQL” cannot continue
because of a missing log file “S0000023.LOG” on database partition “0” and log
stream “0”.

Summary

The Big SQL backup and restore procedure using DB2 commands was outlined in this blog. The backup procedure outlined is an online backup of the Big SQL catalog. Frequent backups of the Big SQL database are recommended after configuration changes have been made or tables have been added. The restore procedure outlined is intended for restoration on the same cluster in case of emergencies. It is recommended that when Big SQL catalog is backed up using the online backup/restore procedure that backups of the Hive meta-store and HDFS/HBase snapshots are taken. Consult with Big SQL support first before restoring the Big SQL database.

1 comment on"Big SQL Online Backup and Restore to the same cluster"

  1. […] another cluster. Read more in Clone a Big SQL DB from one cluster to another.The second approach is Online backups of the head node, these backups can be taken when the database is active. When the backup of the Big SQL database is […]

Join The Discussion

Your email address will not be published. Required fields are marked *