Overview

Skill Level: Any Skill Level

For SAP and HANA Admins

The main reason for the high availability solution is to reduce the downtime of an SAP HANA System Replication in case of software or hardware failures. It helps to avoid operator errors, and to better manage the complexity of the setup.

Ingredients

  • SAP HANA is installed and configured
  • Initial replication tests are completed in both directions

Step-by-step

  1. Terminology

    SAP HANA takeover

    A takeover makes sure that an SAP HANA System Replication secondary instance can operate as SAP HANA System Replication primary instance.

    System Automation failover

    Automatic movement of applications from one node to another node within the same cluster. System Automation uses the term node for one specific operating system image.

    SAP host versus SAP system

    SAP uses the term host for one specific operating system image and SAP system in general for one or multiple hosts by using the same SAP System ID (SAPSID).

    The high availability solution for SAP HANA System Replication uses System Automation for Multiplatforms to automate all SAP components. System Automation for Multiplatforms detects failed components and restarts them or initiates a failover. This setup helps to reduce the operational complexity of an SAP environment and to avoid operator errors, which result from this complexity.

    TSA (TSAMP)

    Tivoli System Automation (for Multiplatforms)

    To access to the TSAMP product, refer to the Announcement Letter:

    https://www-01.ibm.com/common/ssi/cgi-bin/ssialias?infotype=an&subtype=ca&appname=gpateam&supplier=897&letternum=ENUS214-039

     

  2. The standard two nodes SAP HANA setup

    The minimum hardware setup consists of a two-node TSA domain. The two nodes are either two physical computers or two LPARs that run on different physical computers.

    TSA (System Automation) can automate the takeover of the SAP HANA System Replication setup.

    HANA-setup

    System Automation is installed and a high-availability cluster is set up on the primary and secondary host. If the primary host has an outage, then System Automation triggers the failover by using the SAP HANA command hdbnsutil -sr_takeover on the secondary host. Then, the IP address is moved to the new primary host after the old secondary host becomes the new primary host.

    When the old primary host comes back online, System Automation integrates the old primary host as new secondary host again by using the SAP HANA command hdbnsutil -sr_register. The log replication starts from the new primary host to the new secondary host.

  3. Performance issue workaround (mandatory)

    On both nodes under <SID>adm_user

    Edit $HOME/.cshrc

    add the following lines

    # workaround for the slow performance of TSAMP monitoring

    setenv LD_LIBRARY_PATH /lib:/lib64:/usr/lib:/usr/lib64:/usr/libexec:$LD_LIBRARY_PATH

     

    For detailed info refer to: How to perform SAP HANA System Replication

  4. SAP HANA profiles

    ·         SAPSID and Instance numbers for primary and secondary instance must be the same.

    ·         Primary instance needs to have a full backup before you can set up a secondary instance system replication.

    ·         Disable autostart of all SAP HANA instances in all their profiles by commenting the line in the profile:

    On both nodes under <SID>adm_user

    # /usr/sap/<SID>/SYS/profile/DEFAULT.PFL

    or

    # /usr/sap/<SID>/SYS/profile/<SID>_HDB<instancenr>_<vhost>

    e.g. /usr/sap/RR1/SYS/profile/DEFAULT.PFL

     # Autostart = 1

    or

    Autostart = 0

    ·         Define the cluster awareness library in the SAP HANA default profile. Add the following line the SAP HANA default profile /usr/sap/<SID>/SYS/profile/DEFAULT.PFL

    service/halib = /usr/sap/<SID>/SYS/exe/uc/<your platform>/saphascriptco.so
    service/halib_cluster_connector = /usr/sbin/rsct/sapolicies/sap/bin/sap_tsamp_cluster_connector

        e.g. on CMA server mrsrl00241

    service/halib = /usr/sap/RR1/SYS/exe/hdb/saphascriptco.so

     

  5. Installation of saphascriptco.so

    If the file  /usr/sap/<SAPSID>/SYS/exe/hdb/saphascriptco.so is not included in the HANA installation, extract the file from a [Power] LINUX SAP Kernel Part I package manually:

     

    1.   Identify SAP Kernel version (00 Рdefault HANA instance number)

     On both nodes under <SID>adm_user

    # sapcontrol -nr <instance number> -function GetVersionInfo

    e.g. sapcontrol -nr 60 -function GetVersionInfo

    28.11.2017 16:23:07

    GetVersionInfo

    OK

    Filename, VersionInfo, Time

    /usr/sap/<SID>/HDB60/exe/sapstartsrv, 745, patch 400, changelist 1734487, RKS compatibility level 0, opt (Jan 23 2017, 15:41:23), linuxx86_64, 2017 02 16 10:38:12

    On both nodes under <SID>adm_user

     

    2. Download SAP Kernel Part I package. Use the highest patch level

     SAP Support Portal. https://support.sap.com/en/my-support/software-downloads/support-package-stacks.html

    Support Package SAP KERNEL 7.45 64-BIT UNICODE Linux on x86_64 64bit #Database

     # SAPEXE_<patch_level>-<number>.SAR

     e.g. patch_level=400, number=745)

    SAPEXE_400-80000699.SAR

     

    3.   Extract saphascriptco.so from SAR file, enter

    # SAPCAR -xvf SAPEXE_<patch_level>-<number>.SAR saphascriptco.so

    e.g. SAPCAR -xvf SAPEXE_400-80000699.SAR saphascriptco.so

     

    4.   Copy saphascriptco.so to HANA installation, enter

    cp saphascriptco.so /usr/sap/<SID>/SYS/exe/hdb/saphascriptco.so

    e.g. cp saphascriptco.so /usr/sap/<SID>/SYS/exe/hdb/saphascriptco.so

  6. Replication HANA DB sync setting

    Log Retention After Takeover

    After takeover the new primary has to keep log until a new secondary site is registered and has synced the missing log. Because syncing can take some time this behavior has to be explicitly turned on by setting global.ini

    On both nodes under <SID>adm_user

    In /usr/sap/<SID>/SYS/global/hdb/custom/config/global.ini

    enable_log_retention = on

  7. Perform a HANA DB takeover using SAP command line tool

    Before begin with TSA high availability solution implementation, make sure, the HANA DB takeover is working properly in both directions.

    To view the system replication topology configuration status on both systems, execute

    On both nodes under <SID>adm_user

    # hdbnsutil -sr_state

    e.g. hdbnsutil -sr_state
    checking for active or inactive nameserver …
    System Replication State
    ~~~~~~~~~~~~~~~~~~~~~~~~
    online: true
    mode: primary
    site id: 1
    site name: Primary

    Host Mappings:
    ~~~~~~~~~~~~~~
    db-pre1-rry -> [Primary] db-pre1-rry
    db-pre1-rry -> [Secondary] db-pre2-rry

    done.

     

    Important Note:

    Even though the replication state on new Secondary is syncmem, the Replica from new Primary to new Secondary is still in progress. Do not run the back failover before the Replica is completed!

    To see the Replica status, run command on Primary:

    # python /usr/sap/RR1/HDB60/exe/python_support/systemReplicationStatus.py

    e.g. /usr/sap/RR1/HDB60> python /usr/sap/RR1/HDB60/exe/python_support/systemReplicationStatus.py

    HANA-Replication-in-progress

    Wait until the Replication Status becomes ACTIVE

    HANA-Replication-complete

    The replication status must be ACTIVE what means that the replication is established before beginning with the takeover

    under <SID>adm_user

     

    1.    Takeover on <node2> (the node with the current secondary role)

    # hdbnsutil -sr_takeover

     

    2.    Stop HANA DB on <node1> (the former primary node)

    # HDB stop

    hdbdaemon will wait maximal 300 seconds for NewDB services finishing.

    Stopping instance using: /usr/sap/RR1/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 60 -function Stop 400

    15.11.2017 17:51:38

    Stop

    OK

    Waiting for stopped instance using: /usr/sap/RR1/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 60 -function WaitforStopped 600 2

    15.11.2017 17:54:12

    WaitforStopped

    OK

    hdbdaemon is stopped.

     

    3.    Register <node1> as the new secondary
     

    # hdbnsutil -sr_register –name=<SID>Primary –remoteHost=<node2_vhost> –remoteInstance= <instancenr> –replicationMode=syncmem –operationMode=logreplay

    e.g. hdbnsutil -sr_register –name=RRYPrimary –remoteHost=db-pre2-rry –remoteInstance=60 –replicationMode=syncmem –operationMode=logreplay

    adding site …

    checking for inactive nameserver …

    nameserver db-pre1-rry:36001 not responding.

    collecting information …

    registered at 10.0.45.4 (db-pre2-rry)

    updating local ini files …

    done.

    https://help.sap.com/doc/6b94445c94ae495c83a19646e7c3fd56/2.0.02/en-US/2dd26de6360046309e1579accbd9e527.html

     

    4.    Start HANA DB on <node1> (new Secondary)

    HDB start

    e.g. HDB start

    StartService

    Impromptu CCC initialization by ‘rscpCInit’.

      See SAP note 1266393.

    OK

    OK

    Starting instance using: /usr/sap/RR1/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 60 -function StartWait 2700 2

     15.11.2017 17:59:46

    Start

    OK

     15.11.2017 18:01:50

    StartWait

    OK

  8. Client connection recovery

    To perform the takeover only on the SAP HANA system will, in most cases, not be enough. Somehow the client or application server needs to be able to continuously reach the SAP HANA system, no matter which site is currently the primary. This will be done by TSA by means of the ServiceIP redirection: a ServiceIP address is assigned to the virtual host name. In case of a takeover, the ServiceIP will unbind from the network adapter of the primary system and bind to the adapter on the secondary system.

    The ServiceIP has to be configured in the SAP AppServers for communication with HANA DB.

  9. Test cases scenarios

    The test phase is a very important phase to verify if KPIs are met and the landscape performs the way it was configured. Therefore, a few test cases are suggested below as guideline, which should be enhanced by your specific requirements. The tests should be performed with realistic data load and size.

    Note: the test scenarios should be performed after the TSA high availability solution is implemented.

    Test-scenarios-table

  10. TSA Installation

    1     Check prerequesites
     

    On both nodes under root user

    # /sapcd/sap/SAPHIR_LANDSCAPE/TSAMP/FP3_LINUX/SAM4103MPLinux64/prereqSAM prereqSAM

    Error: Prerequisite checking for the ITSAMP installation failed:  RHEL 7.2 x86_64

    prereqSAM: One or more required packages are not installed: perl-Sys-Syslog

    prereqSAM: For details, refer to the ‘Error:’ entries in the log file:¬† /tmp/prereqSAM.1.log

    if it fails (see example above) :

    # yum install perl-Sys-Syslog
    ……………..
    completed

     

    # /sapcd/sap/SAPHIR_LANDSCAPE/TSAMP/FP3_LINUX/SAM4103MPLinux64/prereqSAM prereqSAM

    prereqSAM: All prerequisites for the ITSAMP installation are met on operating system:

    Red Hat Enterprise Linux Server release 7.2 (Maipo)

     

    2     Install TSA 4.1.0.3

    On both nodes under root user

     

    # cd /sapcd/sap/SAPHIR_LANDSCAPE/TSAMP/FP3_LINUX/SAM4103MPLinux64/

    # ./installSAM

     

    3    Connector for the SAP Management Console

    On both nodes under root user

    # cd /sapcd/sap/SAPHIR_LANDSCAPE/TSAMP/FP3_LINUX/4.1.0.3-TIV-ITSAMP-SAPMAINTMODE

    # ./install.sh

    IBM Tivoli System Automation for Multiplatforms – efix installer v1.14

    Installing efix SAPMAINTMODE for SAMP 4.1.0.3 on x86_64_linux_2

    => /usr/sbin/rsct/sapolicies/sap/bin/sap_tsamp_cluster_connector

    Finished installing efix. To remove the efix use uninstall.sh from this deliverable.

     
    4     Install Perl module for samlog on both nodes
     

    On both nodes under root user

    # yum install perl-Thread-Queue

     

    5   Setting CT_MANAGEMENT_SCOPE variable
     

    Setting the environment variable CT_MANAGEMENT_SCOPE to value 2, you set the scope for TSA (RSCT) commands to PeerDomain, or domain consisting of multiple nodes. This value must be valid in all root and non-root sessions. Therefore it is recommended to put it persistent into e.g. /etc/profile.d :

    On both nodes under root user

    Edit /etc/profile.d/tsamp.sh

    with content

    export CT_MANAGEMENT_SCOPE=2

     

    6   Configure netmon.cf
     

    If you are running a single-node or two-node cluster, more configuration is required to detect network interface failures.

    The cluster software periodically tries to contact each network interface in the cluster. If the attempt to contact an interface fails on one node of a two node cluster, the corresponding interface on the other node is also flagged as offline. It is flagged as offline, because it does not receive a response from its peer.

    To avoid this behavior, the cluster software must be configured to contact a network instance outside of the cluster. You may use the gateway of the sub-net the interface is in.

    gateway for PRE : 10.0.236.1

    gateway for PROD : 10.0.234.1

    On both nodes under root user

    Edit /var/ct/cfg/netmon.cf

    Each line of this file contains the system name or IP address of the external network instance. IP addresses can be specified in dotted decimal format.

    e.g.

    #This is default gateway for all interfaces in the subnets
    10.0.236.1

     

  11. Create TSA cluster

    Note: since HANA is configured using vhost (e.g. db-pre1-rry, db-pre2-rry) rather than using local hostnames (e.g. mrsrl00241, mrsrl00242), the TSA cluster must be created with the vhost nodes. These nodes are used in TSA HA policy for procedures like registration of the new Secondary role.

    On both nodes under root user

    # preprpnode <vhost1> <vhost2>

    e.g. preprpnode db-pre1-rry db-pre2-rry

     

    On either nodes under root user

    # mkrpdomain hana_hrs_domain <vhost1> <vhost2>

    e.g. mkrpdomain hana_hrs_domain db-pre1-rry db-pre2-rry

     

    # startrpdomain hana_hrs_domain

    # lsrpdomain

    # lsrpnode

     

     

  12. Install SAP HA license on both nodes

    On both nodes under root user

    in /sapcd/sap/SAPHIR_LANDSCAPE/TSAMP/FP3_LINUX/SAM4103MPLinux64/license

    # samlicm -i sam41SAP.lic
    # samlicm ‚Äďs

    Product: IBM Tivoli System Automation for Multiplatforms 4.1.0.0
    Creation date: Fri 16 Aug 2013 12:00:01 AM CEST
    Expiration date: Thu 31 Dec 2037 12:00:01 AM CET

    Product Annotation: SA for MP – SAP HA policy
    Creation date: Fri 06 Dec 2013 12:00:01 AM CET
    Expiration date: Thu 31 Dec 2037 12:00:01 AM CET

  13. Setting up adm user in TSA command line interface

    On either nodes under root user

    # /usr/sbin/rsct/sapolicies/sap/bin/addsaoperator -v <SID>adm

    e.g. /usr/sbin/rsct/sapolicies/sap/bin/addsaoperator -v rr1adm

  14. Set a Network Tiebreaker

    Set the gateway as the Tiebreaker component :

     ·         gateway for PROD : 10.0.234.1

    Define the Tiebreaker resource:

    # mkrsrc IBM.TieBreaker Name=”IPTB” Type=”EXEC” DeviceInfo=’PATHNAME=/usr/sbin/rsct/bin/samtb_net Address=<Tiebreaker> Log=1′ PostReserveWaitTime=30

    e.g. mkrsrc IBM.TieBreaker Name=”IPTB” Type=”EXEC” DeviceInfo=’PATHNAME=/usr/sbin/rsct/bin/samtb_net Address=10.0.236.1 Log=1′ PostReserveWaitTime=30

     

    Activate the Tiebreaker:

    # chrsrc -c IBM.PeerNode OpQuorumTieBreaker=IPTB

    # lsrsrc -c IBM.PeerNode | grep -i OpQuorumTieBreaker

    Resource Class Persistent Attributes for IBM.PeerNode
    resource 1:
    ¬†¬†¬†¬†¬†¬†¬† OpQuorumTieBreaker¬†¬†¬†¬†¬†¬†¬†¬† = “IPTB”

  15. Create TSA HANA high availability policy

    On <node1> under root user

    # cd /usr/sbin/rsct/sapolicies/sap

    # sampolicy -w sap_HDB_SR_v41.tmpl.xml

    Go through the steps filling out the required fields

    HANA-Policy-Wizard-1

    Enter 1 to begin

    HANA-Policy-Wizard-2

    Enter 3 for next step

    etc.

    Then all parameters are filled out:

    HANA-Policy-Wizard-complete

    Enter 0 for Finish

    HANA-Policy-Wizard-activate

    Enter 1 or 2 for policy activation / update

    Important Note:

    after policy activation the HANA database and replication will be shut down by TSA

     

    Bring up HANA DB and Replication

    # chrg -o online -s “Name like ‘%'”

    Make use of the TSA UI:

    # samcc [-test]

    samcc-initial

    • use + and ‚Äď to expand and collapse the resource view
    • use v to see relationships between resources and the ServiceIP

     

  16. Monitor timeout adjustment

    Since the performance of HANA status commands in HANA v1 is not yet optimal, the TSA monitoring timeout has to be increased to 90 seconds, to allow the TSA Monitor commands to return the meaning values.

    On either node :

    # lsrsrc -s “Name like ‘SAP_HDB_<SID>_HDB<instancenr>_sr_primary_hdb'” IBM.Application

    # chrsrc -s “Name like ‘SAP_HDB_<SID>_HDB<instancenr>_sr_primary_hdb'” IBM.Application MonitorCommandTimeout=90

     

    do the same for the secondary resource

    e.g. lsrsrc -s “Name like ‘SAP_HDB_RR1_HDB60_sr_secondary_hdb'” IBM.Application

    e.g. chrsrc -s “Name like ‘SAP_HDB_RR1_HDB60_sr_secondary_hdb'” IBM.Application MonitorCommandTimeout=90

  17. Save the policy for backup / porting purposes

    Create on node <node1>

    e.g.

    # mkdir /etc/opt/IBM/tsamp/sam/policyPool/savedPolicies

    # sampolicy -s HDB_SR_adjusted_timeout_final.xml

     

    Note:

    You can use the saved xml policy file to port the policy to another HANA cluster. You need then just to replace: domain name, node names, IP addresses. 

  18. Test scenarios. Introduction

    Refer to TSA Knowledge Center for more information:

    https://www.ibm.com/support/knowledgecenter/en/SSRM2X_4.1.0/com.ibm.samp.doc_4.1/toplan_verifySAP_HANA.html

     

    There are following categories of the test scenarios:

    –¬†¬†¬†¬†¬†¬†¬†¬†¬† Normal Operations

    –¬†¬†¬†¬†¬†¬†¬†¬†¬† Maintenance

    –¬†¬†¬†¬†¬†¬†¬†¬†¬† Unplanned outages

    In the next steps some test examples are described.

     

     

  19. Shut down HANA

    Initial status

    Initial status

    mrsrl00241:HDB:rr1adm /usr/sap/RR1/HDB60 52> hdbnsutil -sr_state
    checking for active or inactive nameserver …
    System Replication State
    ~~~~~~~~~~~~~~~~~~~~~~~~
    online: true
    mode: primary
    site id: 1
    site name: RRYPrimary
    Host Mappings:
    ~~~~~~~~~~~~~~
    db-pre1-rry -> [RRYPrimary] db-pre1-rry
    db-pre1-rry -> [RRYSecondary] db-pre2-rry

    done.

    mrsrl00242:HDB:rr1adm /usr/sap/RR1/HDB60 43> hdbnsutil -sr_state
    checking for active or inactive nameserver …
    System Replication State
    ~~~~~~~~~~~~~~~~~~~~~~~~
    online: true
    mode: syncmem
    site id: 2
    site name: RRYSecondary
    active primary site: 1

    Host Mappings:
    ~~~~~~~~~~~~~~
    db-pre2-rry -> [RRYPrimary] db-pre1-rry
    db-pre2-rry -> [RRYSecondary] db-pre2-rry
    primary masters:db-pre1-rry
    done.

    # samcc

    samcc-initial

    Procedure

    Run command:

    # chrg -o offline -s ‚ÄúName like ‚Äė%‚Äô‚ÄĚ

    All services will be stopped.

    The final status is like below:

    samcc-everything-offline

     

    To bring up everything:

    # chrg -o offline -s ‚ÄúName like ‚Äė%‚Äô‚ÄĚ

  20. Move HANA Primary Role to other node

    Initial status

    mrsrl00241:HDB:rr1adm /usr/sap/RR1/HDB60 52> hdbnsutil -sr_state
    checking for active or inactive nameserver …
    System Replication State
    ~~~~~~~~~~~~~~~~~~~~~~~~
    online: true
    mode: primary
    site id: 1
    site name: RRYPrimary
    Host Mappings:
    ~~~~~~~~~~~~~~
    db-pre1-rry -> [RRYPrimary] db-pre1-rry
    db-pre1-rry -> [RRYSecondary] db-pre2-rry

    done.

    mrsrl00242:HDB:rr1adm /usr/sap/RR1/HDB60 43> hdbnsutil -sr_state
    checking for active or inactive nameserver …
    System Replication State
    ~~~~~~~~~~~~~~~~~~~~~~~~
    online: true
    mode: syncmem
    site id: 2
    site name: RRYSecondary
    active primary site: 1

    Host Mappings:
    ~~~~~~~~~~~~~~
    db-pre2-rry -> [RRYPrimary] db-pre1-rry
    db-pre2-rry -> [RRYSecondary] db-pre2-rry
    primary masters:db-pre1-rry
    done.

    # samcc

    samcc-initial

    Procedure

    Run command:

    # rgreq -o move SAP_HDB_RR1_HDB60_sr_primary_rg

    1.   Primary role will be stopped on db-pre1-rry:

    2.   Primary will be started on db-pre2-rry:

    Move-request

    3.   After tables replication / recovery, the Secondary will be stopped on db-pre2-rry.

    4.   Secondary will be registered on db-pre1-rry

    5.   Secondary will be restarted on db-pre1-rry

    6.   Replication will be synchronized

     

    Final status:

    mrsrl00242:HDB:rr1adm /usr/sap/RR1/HDB60 44> hdbnsutil -sr_state

    checking for active or inactive nameserver …

    System Replication State
    ~~~~~~~~~~~~~~~~~~~~~~~~

    online: true

    mode: primary
    site id: 2

    site name: RRYSecondary
    Host Mappings:

    ~~~~~~~~~~~~~~

    db-pre2-rry -> [RRYPrimary] db-pre1-rry

    db-pre2-rry -> [RRYSecondary] db-pre2-rry

    done.

     

    mrsrl00241:HDB:rr1adm /usr/sap/RR1/HDB60 53> hdbnsutil -sr_state

    checking for active or inactive nameserver …

    System Replication State

    ~~~~~~~~~~~~~~~~~~~~~~~~

    online: true

    mode: syncmem

    site id: 1

    site name: RRYPrimary

    active primary site: 2

     

    Host Mappings:

    ~~~~~~~~~~~~~~

    db-pre1-rry -> [RRYPrimary] db-pre1-rry

    db-pre1-rry -> [RRYSecondary] db-pre2-rry

    primary masters:db-pre2-rry

    done.

    Move-request-complete

    The same procedure is to be performed for moving the HANA Primary role to the opposite direction.

  21. Take a node to maintenance

    Move all resources away from db-pre2-rry to apply operating system or hardware maintenance.

    Initial status

    mrsrl00242:HDB:rr1adm /usr/sap/RR1/HDB60 44> hdbnsutil -sr_state

    checking for active or inactive nameserver …

    System Replication State
    ~~~~~~~~~~~~~~~~~~~~~~~~

    online: true

    mode: primary
    site id: 2

    site name: RRYSecondary
    Host Mappings:

    ~~~~~~~~~~~~~~

    db-pre2-rry -> [RRYPrimary] db-pre1-rry

    db-pre2-rry -> [RRYSecondary] db-pre2-rry

    done.

     

    mrsrl00241:HDB:rr1adm /usr/sap/RR1/HDB60 53> hdbnsutil -sr_state

    checking for active or inactive nameserver …

    System Replication State

    ~~~~~~~~~~~~~~~~~~~~~~~~

    online: true

    mode: syncmem

    site id: 1

    site name: RRYPrimary

    active primary site: 2

     

    Host Mappings:

    ~~~~~~~~~~~~~~

    db-pre1-rry -> [RRYPrimary] db-pre1-rry

    db-pre1-rry -> [RRYSecondary] db-pre2-rry

    primary masters:db-pre2-rry

    done.

    # samcc

    Move-request-complete

    Procedure

    Exclude the active node from the automation. Run command:

    # samctrl -u a db-pre2-rry

    1.   sr_primary_rg group stop on db-pre2-rry

    2.   sr_primary_rg group start on db-pre1-rry

    3.   sr_secondary_rg group gets sacrificed

    Final status

    Maintenance-final

    mrsrl00241:HDB:rr1adm /usr/sap/RR1/HDB60 54> hdbnsutil -sr_state
    checking for active or inactive nameserver …
    System Replication State
    ~~~~~~~~~~~~~~~~~~~~~~~
    online: true
    mode: primary
    site id: 1
    site name: RRYPrimary
    Host Mappings:
    ~~~~~~~~~~~~~~
    db-pre1-rry -> [RRYPrimary] db-pre1-rry
    db-pre1-rry -> [RRYSecondary] db-pre2-rry
    done.

    mrsrl00242:HDB:rr1adm /usr/sap/RR1/HDB60 45> hdbnsutil -sr_state
    checking for active or inactive nameserver …
    nameserver db-pre2-rry:36001 not responding.
    System Replication State
    ~~~~~~~~~~~~~~~~~~~~~~~~
    online: false
    mode: primary
    site id: 2
    site name: RRYSecondary
    done.

    After maintenance on db-pre2-rry is finished, take the node back to automation:

    Include the node back to the automation. Run command:

    # samctrl -u d db-pre2-rry

    1. Secondary role will be synchronized

    2. sr_secondary_rg group start on db-pre2-rry

    Final status

    samcc-initial

    mrsrl00242:HDB:rr1adm /usr/sap/RR1/HDB60 43> hdbnsutil -sr_state
    checking for active or inactive nameserver …
    System Replication State
    ~~~~~~~~~~~~~~~~~~~~~~~~
    online: true
    mode: syncmem
    site id: 2
    site name: RRYSecondary
    active primary site: 1
    Host Mappings:
    ~~~~~~~~~~~~~~
    db-pre2-rry -> [RRYPrimary] db-pre1-rry
    db-pre2-rry -> [RRYSecondary] db-pre2-rry
    primary masters:db-pre1-rry
    done.

    HSR replication is reestablished

    mrsrl00241:HDB:rr1adm /usr/sap/RR1/HDB60 56> python /usr/sap/RR1/HDB60/exe/python_support/systemReplicationStatus.py

    Replication-reestablished

    Repeat the same scenario in the opposite direction, taking dn-pre1-rry into maintenance.

  22. Crash node with HANA Primary Role

    Initial status

    mrsrl00242:HDB:rr1adm /usr/sap/RR1/HDB60 44> hdbnsutil -sr_state

    checking for active or inactive nameserver …

    System Replication State
    ~~~~~~~~~~~~~~~~~~~~~~~~

    online: true

    mode: primary
    site id: 2

    site name: RRYSecondary
    Host Mappings:

    ~~~~~~~~~~~~~~

    db-pre2-rry -> [RRYPrimary] db-pre1-rry

    db-pre2-rry -> [RRYSecondary] db-pre2-rry

    done.

     

    mrsrl00241:HDB:rr1adm /usr/sap/RR1/HDB60 53> hdbnsutil -sr_state

    checking for active or inactive nameserver …

    System Replication State

    ~~~~~~~~~~~~~~~~~~~~~~~~

    online: true

    mode: syncmem

    site id: 1

    site name: RRYPrimary

    active primary site: 2

     

    Host Mappings:

    ~~~~~~~~~~~~~~

    db-pre1-rry -> [RRYPrimary] db-pre1-rry

    db-pre1-rry -> [RRYSecondary] db-pre2-rry

    primary masters:db-pre2-rry

    done.

    # samcc

    Move-request-complete

    Procedure

    PowerOff on the active node

    HANA Primary Role failed over to the other node.

    Final status

    Crash-failover-complete

     

    mrsrl00241:HDB:rr1adm /usr/sap/RR1/HDB60 51> hdbnsutil -sr_state
    checking for active or inactive nameserver …
    System Replication State
    ~~~~~~~~~~~~~~~~~~~~~~~~
    online: true
    mode: primary
    site id: 1
    site name: RRYPrimary
    Host Mappings:
    ~~~~~~~~~~~~~~
    db-pre1-rry -> [RRYPrimary] db-pre1-rry
    db-pre1-rry -> [RRYSecondary] db-pre2-rry
    done.

    Now PowerOn the crashed node

    …Waiting for reestablishing the replication

    Waiting-for-reestablishing-Replication

    Final status

    samcc-initial

     

Join The Discussion