IBM Support

Big SQL v4.1.0.2: Big SQL HA manual procedure to recover from a dead "Big SQL Head" host - Hadoop Dev

Technical Blog Post


Abstract

Big SQL v4.1.0.2: Big SQL HA manual procedure to recover from a dead "Big SQL Head" host - Hadoop Dev

Body

Introduction

The procedure documented below is for Big SQL v4.1.0.2 only.

In a Big SQL High Availability (HA) enabled cluster, when the “Big SQL Head” goes down, the “Big SQL Secondary Head” node automatically takes over as the active head and if the old primary head node comes back up, it will assume the role of the secondary head and High Availability will be restored.

This article shows how to replace a permanently dead “Big SQL Head” node that is not possible to bring back up for some reason, e.g. a hardware error occurred. The purpose of this procedure is to restore a working Big SQL HA enabled cluster again.

Preparation

Before starting the manual instruction steps, identify the hosts involved and map to H1, H2 and H3 as follows:
Ambari server – Host where Ambari server is running
Host H1 – original “Big SQL Head” component – dead node
Host H2 – original “Big SQL Secondary Head” component
Host H3 – replacement head node for H1

At the end of this manual procedure, Big SQL HA is restored with the following hosts as head nodes:
Host H2 – will be the new “Big SQL Head” component
Host H3 – will be the new “Big SQL Secondary Head” component

In the example screen shots for the manual instruction steps, the specific hosts are:
bdavm639 – Ambari server, Big SQL worker
bdavm756 – H3
bdavm757 – H1
bdavm823 – H2

Steps:

  1. On H2: Verify H2 is now the “primary” Big SQL head node
    Before proceeding with next steps, verify that H2 is now the “PRIMARY” Big SQL head node.
    As user “bigsql”, run “db2pd -hadr -db bigsql | grep -I hadr” and verify “HADR_ROLE=PRIMARY”.
  2. rsz_1one

  3. On Ambari server host: Modify scripts
    As user “root”, run the following:

    • (a) On the Ambari Server machine, change Head Node install function in the bigsql-head.py to NO OP.
      Add return on first line. This is needed for step 5 below because everything is installed already.
      In this file:
      /var/lib/ambari-server/resources/stacks/BigInsights/4.1/services/BIGSQL/package/scripts/bigsql-head.py

                 class BigsqlHead(Script):             def install(self, env):             import params             import os               return		<<-- Add this line.  
    • (b) On the Ambari Server machine, change secondary head status function bigsql-secondary-head.py
      “exit ( 1 ) “. This will make it fail.
      The goal is to have the component stopped in order to be deleted.
      In this file:
      /var/lib/ambari-server/resources/stacks/BigInsights/4.1/services/BIGSQL/package/scripts/bigsql-secondary-head.py

                def status(self, env):            import status_params            import os              exit(1)		<<-- Add this line.  
    • (c) Restart Ambari Server so that all the files under
      /var/lib/ambari-agent/cache/stacks/BigInsights/4.1/services/BIGSQL/package/scripts
      on the other host will be updated to reflect above changes.
  4. On host H2: Delete “Big SQL Secondary Head” component from Ambari UI
    Delete Secondary head component from H2 ( not host ) w/o decommission.
    i.e. Hosts Tab -> “Big SQL Secondary Head” component dropdown -> Delete
    rsz_three

  5. On Ambari server host: Modify the Big SQL metainfo.xml file
    On the Abmari server machine, change metainfo.xml so that head node cardinality is “1+”.

    • (a) In this file:
      /var/lib/ambari-server/resources/stacks/BigInsights/4.1/services/BIGSQL/metainfo.xml

                        <component>                      <name>BIGSQL_HEAD</name>                      <displayName>Big SQL Head</displayName>                      <category>MASTER</category>                      <cardinality>1+</cardinality>		  <<-- Change 1 to 1+    
    • (b) Restart Ambari server
  6. On host H2: Add “Big SQL Head” component from Ambari UI
    Install ( a second ) head node component on the old secondary head node host ( H2 ).
    i.e. Hosts Tab -> Components->+Add

    rsz_five

  7. On host H1: Delete dead host H1 from Ambari UI
    Remove old Head Node host ( H1 ) from ambari Hosts Tab -> Host actions->Delete Host.
    Ambari does not allow deleting the “Big SQL Head” component because its status will be unknown.

    rsz_six

  8. On Ambari server host: Undo changes from steps 2 and 4
    As user “root”, run the following:

    • (a) Undo changes from steps 2 and 4.
    • (b) On Ambari Server machine, re-Install BI-Analyst* packages.
      Run this command to reset HA section so as to be able to enable HA):

          yum reinstall BI-Analyst*  
    • (c) Restart Ambari Server
  9. On host H2: Clean up HADR and TSA. Restore database role to standard.
    Run the following commands on the host H2 as user “root”:

    • (a) cd /var/lib/ambari-agent/cache/stacks/BigInsights/4.1/services/BIGSQL/package/scripts/
    • (b) ./bigsql-HA-cleanup.sh -P H2 -S H1 -U bigsql
      !! WARNING !! : Do not specify “-P H1 -S H2” , otherwise the bigsql database will be dropped
    • (c) Ignore the following script error from step (b) above.
    • rsz_eight

    • (d) After, you should have this output on H2 when you run the following command as user “bigsql”:
      “db2 get db cfg for bigsql | grep -i hadr”. Verify “HADR Database role = STANDARD”.
      rsz_eight_d
    • (e) Ensure lssam shows no resources, to ensure there are no resources left, use manual cleanup
      procedure to clean up left TSA resources. As a brute force method, delete the domain.
      As “bigsql” user on H2, run:

      • (i) lsrpdomain ( will show name )
      • (ii) stoprpdomain -f domain-1
      • (iii) rmrpdomain -f domain-1
        Rerun the command if you get error:

          2632-039 The domain domain-1 cannot be removed.  The domain may be in a transition pending state, or was just removed by another command.  
      • (iv) to check: lssam
        rsz_eight_e
      • (v) then you MUST run “db2haicu -delete” again on H2.
        You must have this dbm cfg clear:

          db2 get dbm cfg | grep -i clus  Cluster manager =  

        rsz_eight_e2

  10. On host H3: Add “Big SQL Secondary Head” component from Ambari UI
    Set up new secondary head. ( install secondary head comp to the H3 host ).
    Hosts Tab -> Components -> Add

    • (a) On H3, perform Hosts->Components->+Add “Big SQL Secondary Head”
      Before adding H3 as new “Big SQL Secondary Head” component,
      ensure /etc/bigsql/bigsqlHAHostList file on H2 is removed.

      rsz_nine_a

    • (b) Verify new Host H3 Big SQL installation
      — db2 gets installed here: /usr/ibmpacks/bigsql/4.1/db2
      — This must install db2 binaries. To check, use “yum list installed | grep -i db2”.
      rsz_nine_b
    • (c) Start the new “Big SQL Secondary Head” component from Ambari UI
      rsz_nine_c
  11. On host H3: Enable Big SQL HA from Ambari UI
    • (a) Ensure Big SQL Service is started. EnableHA as normal using the new host ( H3 ).
    • (b) Run Big SQL Service check to verify Big SQL is installed correctly
    • rsz_ten

  12. This is the final step of the manual procedure and we now have Big SQL HA restored with the following hosts as head nodes:
    Host H2 – will be the new “Big SQL Head” component
    Host H3 – will be the new “Big SQL Secondary Head” component

** This article was a joint effort by Wen-Yi Chua, Emad Boctor, Diego Santesteban, and Brian Cahill.

[{"Business Unit":{"code":"BU059","label":"IBM Software w\/o TPS"},"Product":{"code":"SSCRJT","label":"IBM Db2 Big SQL"},"Component":"","Platform":[{"code":"PF025","label":"Platform Independent"}],"Version":"","Edition":"","Line of Business":{"code":"LOB10","label":"Data and AI"}}]

UID

ibm16259917