SummaryBig SQL High Availability (HA) refers to providing high availability for the meta data of the Big SQL database (the Big SQL Metastore ) and for the Big SQL Scheduler component. The Big SQL database is used by Big SQL applications to access Hadoop tables.
IntroductionIn this article we will look at steps to recover BigSQL HA in scenarios where resource group is showing up in offline state and HA is not being established. For example, when lssam command is run, db2_bigsql_bigsql_BIGSQL-rg resource group shows up as offline,
lssam Online IBM.ResourceGroup:BigSQLScheduler_bigsql-rg Nominal=Online '- Online IBM.Application:BigSQLScheduler_bigsql-rs |- Online IBM.Application:BigSQLScheduler_bigsql-rs:wine2 '- Online IBM.Application:BigSQLScheduler_bigsql-rs:wine3 Offline IBM.ResourceGroup:db2_bigsql_bigsql_BIGSQL-rg Request=Lock Nominal=Online '- Offline IBM.Application:db2_bigsql_bigsql_BIGSQL-rs Control=StartInhibitedBecauseSuspended |- Offline IBM.Application:db2_bigsql_bigsql_BIGSQL-rs:wine2 '- Offline IBM.Application:db2_bigsql_bigsql_BIGSQL-rs:wine3
Running db2pd on the head node to list the hadr status shows up as not active.
db2pd -db bigsql -hadr Database Member 0 -- Database BIGSQL -- Active -- Up 0 days 04:55:53 -- Date 2018-01-30-03.06.34.763327 HADR Information: HADR is not active.
Users can connect to individual BigSQL head nodes and run the queries.
Pre-requisites:Following assumptions are made while writing this blog, users should only proceed if the following conditions are satisfied
- The HADR config files created by installer is present on each of the head nodes, in order to verify if the xml config files are present run the following command on each of the head node,
ls /var/lib/ambari-agent/cache/extensions/IBM-Big_SQL/18.104.22.168/services/BIGSQL/package/scripts/*.xml /var/lib/ambari-agent/cache/extensions/IBM-Big_SQL/22.214.171.124/services/BIGSQL/package/scripts/
- There has been no changes made to the network configuration such as firewall, IP address of network interfaces , gateway address on both the head nodes post installation.
- Users are expected to be familiar with HA terminologies and db2 cli commands
Steps to recover HA.
The following steps should be run as bigsql instance user which by default is bigsql , unless otherwise stated.
1. Remove the HA cluster configuraiton by running db2haicu -delete on both the head nodes.
db2haicu -delete Welcome to the DB2 High Availability Instance Configuration Utility (db2haicu). You can find detailed diagnostic information in the DB2 server diagnostic log file called db2diag.log. Also, you can use the utility called db2pd to query the status of the cluster domains you create. For more information about configuring your clustered environment using db2haicu, see the topic called 'DB2 High Availability Instance Configuration Utility (db2haicu)' in the DB2 Information Center. db2haicu determined the current DB2 database manager instance is 'bigsql'. The cluster configuration that follows will apply to this instance. When you use db2haicu to configure your clustered environment, you create cluster domains. For more information, see the topic 'Creating a cluster domain with db2haicu' in the DB2 Information Center. db2haicu is searching the current machine for an existing active cluster domain ... db2haicu found a cluster domain called 'domain-1' on this machine. The cluster configuration that follows will apply to this domain. Deleting the domain 'domain-1' from the cluster ... Deleting the domain 'domain-1' from the cluster was successful. All cluster configurations have been completed successfully. db2haicu exiting ...
2. Once removed, run the lsrpdomain command , it should not list any domains on both the head nodes
3. Verify if HADR configuration variables are still intact on both the head nodes, by running following command,
db2 get db cfg for bigsql| grep -i HADR HADR database role = PRIMARY HADR local host name (HADR_LOCAL_HOST) = headhost1.ibm.com HADR local service name (HADR_LOCAL_SVC) = 20008 HADR remote host name (HADR_REMOTE_HOST) = headhost2.ibm.com HADR remote service name (HADR_REMOTE_SVC) = 20008 HADR instance name of remote server (HADR_REMOTE_INST) = bigsql HADR timeout value (HADR_TIMEOUT) = 120 HADR target list (HADR_TARGET_LIST) = HADR log write synchronization mode (HADR_SYNCMODE) = NEARSYNC HADR spool log data limit (4KB) (HADR_SPOOL_LIMIT) = AUTOMATIC(384000) HADR log replay delay (seconds) (HADR_REPLAY_DELAY) = 0 HADR peer window duration (seconds) (HADR_PEER_WINDOW) = 600 HADR SSL certificate label (HADR_SSL_LABEL) =
4. On the standby head node check if the database is in rollforward pending state ,
db2 get db cfg for bigsql|grep -i rollforward Rollforward pending = NO
If the output says NO then proceed to step 5, if the output says DATABASE proceed to step 7.
5. Take a backup of database from primary head node by running following sequence of commands,
a. db2 terminate b. db2 deactivate database bigsql c. db2 backup db BIGSQL to ~/backup_dir
6. Copy the backup image to the standby head node and restore it by using following steps
a. Make sure no applications are connected to the standby head node db2 list applications SQL1611W No data was returned by Database System Monitor b. Check if BigSQL database is currently active db2 list active databases c. If BigSQL database is listed as active , deactivate it first before restoring the backup db2 deactivate db bigsql d. From the backup_dir run the restore command, when prompted for overwrite select Y db2 restore database bigsql taken at 20180228152811 e. Now check again if rollforward is pending on database, by running following command db2 get db cfg for bigsql|grep -i rollforward Rollforward pending = DATABASE
7. Start HADR on Standby node
db2 start hadr on database bigsql as standby
8. Start HADR on primary head node
db2 start hadr on database bigsql as primary
9. Now restore the HA configurations using following sequence of steps
a. As the root user run prepnode to prepare the hosts for HA cluster configuration. Run the following command by replacing the corresponding hostnames preprpnode <headnode1_host_name> <headnode2_host_name> b. Run the following command as bigsql user on each of the head node , db2haicu -f /var/lib/ambari-agent/cache/extensions/IBM-Big_SQL/126.96.36.199/services/BIGSQL/package/scripts/<hostname>-hadr.xml
If all the steps succeeded , BigSQL HA should be online and working . To confirm list the resources groups using lssam to verify if all the resources are online.
The output should be similar to following excerpt,
lssam Online IBM.ResourceGroup:BigSQLScheduler_bigsql-rg Nominal=Online '- Online IBM.Application:BigSQLScheduler_bigsql-rs |- Online IBM.Application:BigSQLScheduler_bigsql-rs:wine2 '- Online IBM.Application:BigSQLScheduler_bigsql-rs:wine3 Online IBM.ResourceGroup:db2_bigsql_bigsql_BIGSQL-rg Nominal=Online '- Online IBM.Application:db2_bigsql_bigsql_BIGSQL-rs |- Online IBM.Application:db2_bigsql_bigsql_BIGSQL-rs:wine2 '- Offline IBM.Application:db2_bigsql_bigsql_BIGSQL-rs:wine3If any of the above steps fail, please reach out to IBM BigSQL Support for assistance .