Archived content

Archived date: 2019-08-12

This content is no longer being updated or maintained. The content is provided “as is.” Given the rapid evolution of technology, some content, steps, or illustrations may have changed.

Introduction

This article describes all the Cluster Aware AIX (CAA) tunables and its possible recomendations.

In most of the scenarios, IBM® PowerHA® automatically configures the recomended values. But based on the cluster configuration, some of the parameters might need to be changed. These tunables should not be changed directly and should be modified only through PowerHA.

Summary of tunables

CAA tunable Description Options PowerHA clmgr options
Communication_mode This specifies the heartbeat type in the cluster. unicast => Unicast cluster packets multicast => Multicast cluster packets clmgr mod cl HEARTBEAT_TYPE=unicast
Deadman_mode A dead man’s switch is an action that occurs when CAA detects a node becoming isolated in a multinode environment. The dead man’s switch mode can be set to either force a system shutdown (a) or generate an Autonomic Health Advisor File System (AHAFS) event (e). a => Forces a system shutdown e => Generates an AHAFS event Can’t set it using clmgr. PowerHA doesn’t recommend to change this value after the PowerHA cluster is created. /usr/sbin/clctrl -tune -o deadman_mode=a
Link_timeout This is the time (in seconds) for which the health management layer waits before declaring that the inter-site link has failed. CAA => in millseconds PowerHA => in seconds clmgr modify cluster SITE_GRACE_PERIOD=5
Local_merge_policy Local_merge_policy is used to merge nodes within a site.This is applicable for a single-site merge. none(n) => No action is taken majority => max number of nodes wins, lossing site nodes get rebooted, but it only supported till AIX 721 release Heuristic => Configured when merge policy is not none. This value is automatically set based on the quarantine policy and merge policy. PowerHA doesn’t recommend to change value using clctrl.
Network_fdt Amount of time CAA would wait to give network failure notification to the PowerHA. CAA => in millseconds PowerHA => in seconds_ clmgr modify cluster NETWORK_FAILURE_DETECTION_TIME=25
Node_down_delay Amount of time in milliseconds that the node monitor must wait after it makes the determination that another node is down before posting it as DOWN. CAA => in milliseconds PowerHA => in seconds clmgr modify cluster GRACE_PERIOD=11
node_timeout Number of milliseconds that other nodes may consider to mark node X as DOWN if they do not receive any incoming heartbeat from node X. CAA uses this attribute to control the frequency of gossip packets CAA => in milliseconds PowerHA => in seconds clmgr modify cluster HEARTBEAT_FREQUENCY=30
remote_hb_factor The frequency at which gossips are sent to the remote site is controlled by remote_hb_factor. Factor clmgr modify cluster SITE_HEARTBEAT_CYCLE=9
Site_merge_policy Tunable determines the dominant site during a merge operation. P – Priority h – Heuristic m – Manual n – None site_merge_policy is automatically set by PowerHA based on the split and merge management policy.

CAA tunables in detail

This section provides details about the frequently used CAA tunables along with their usage.

communication_mode

Purpose: Communication mode used by CAA for heartbeat and other protocol message transfer.
Scope: Clusterwide
Values: Default: u
Range: m, u
Unit: NA
Tuning: When the value is set to m, CAA uses the multicast communication mode. When the value is set to u, CAA uses the unicast communication mode.
Usage: PowerHA => clmgr mod cl HEARTBEAT_TYPE=unicast or multicast CAA => /usr/sbin/clctrl -tune -o communication_mode=m or u
Command to check the current value: PowerHA:# cltopinfo Cluster Name: callisto3_cluster Cluster Type: Standard Heartbeat Type: UnicastCAA: /usr/sbin/clctrl -tune -L
Recommendation: Before selecting multicast, we must ensure that the network infrastructure actually supports multicast and the use of multicast is consistent with the network policies. Select unicast if the customer’s environment does not support multicast and user don’t want to use the Ethernet interfaces directly for cluster communication.

config_timeout

Purpose: Controls the timeout for coordinated configuration changes per phase per node.
Scope: Clusterwide, per node
Values: Default: 240
Range: 0 – 2147483647
Unit: Seconds
Tuning: A positive value indicates the maximum number of seconds CAA waits for the execution of client-side callouts including scripts and CAA configuration code. A value of zero disables the timeout. Refer to the following URL for more details. https://www.ibm.com/support/knowledgecenter/SSPHQG_EOS/eos/eos.htm?origURL=SSPHQG_7.1.0/com.ibm.powerha.trgd/ha_trgd_config_too_long.htm%5D
Usage: config_timeout can’t be changed if PowerHA is configured.
Recommendation: NA

deadman_mode

Purpose: Controls the behavior of the dead man timer. The interval associated with this timer is the node_timeout tunable.
Scope: Clusterwide, per node
Values: Default: a
Range: a, e
Unit: NA
Tuning: When the value is set to a (assert), the node would crash when the dead man timer pops. When the value is set to e (event), an AHAFS event is generated.
Usage: CAA => /usr/sbin/clctrl -tune -o deadman_mode=e or a
Recommendation: Recommeneded value is a. If the dead man timeout is reached, the node crashes immediately to prevent a partitioned cluster and data corruption. To delay the dead man crash, you can increase the node_timeout tunable using the following command: # clmgr modify cluster HEARTBEAT_FREQUENCY=120 [new value] You need to verify and sync to update the new value to the configuration. You also need to create an alias for this tunable. The name should match to the one in the smit panel. PowerHA doesn’t recommend to change this value.
Purpose: Time (in seconds) for which the health management layer waits before declaring that the inter-site link has failed.
Scope: Clusterwide, per node
Values: Default: 30000
Range: 0 – 1200000
Unit: Milliseconds
Tuning: This tunable is used only for a linked cluster. A link failure detection can cause the cluster to switch to another link and continue the communication. If all the links fail, this results in declaring a site failure. The default value is 30 seconds. This attribute is applied as a delta to the node timeout for nodes that are at the remote sites. The node monitor waits for the node_timeout plus link_timeout seconds before determining that a remote node is down. Then, it will wait for an additional node_down_delay seconds before posting that the remote node is DOWN.
Usage: PowerHA => clmgr modify cluster SITE_GRACE_PERIOD=5
Command to check the current value: PowerHA: `# clmgr query cluster grep “SITEGRACE_PERIOD” SITE_GRACE_PERIOD=”5″`CAA: `# clctrl -tune -L grep link_timeout link_timeout 5000 0 1171K milliseconds c n`
Recommendation: NA

local_merge_policy

Purpose:

Policy used to merge nodes within a site

Scope:

Clusterwide

Values:

Default: m

Range:

n, h, m

Unit:

NA

Tuning:

n = None

h = Heuristic

m = Majority

Usage:

Supported metric

  1. Quarantine policy exists : set value to ‘n’
  2. Quarantine policy does not exist:

  3. AIX level less than 7210:

    • Merge policy is not majority: Set value to n
    • Merge policy is majority: Set value to m
  4. AIX level greater than/equal to 7210:

    • Merge policy is not none: set value to h
    • Merge policy is none: set value to n

local_merge_policy is applicable to clusters other than linked clusters. It can accept three values n (none), m (majority), and h (heuristic).

  • In case of m, RSCT will not receive a MERGE event, and CAA will take an action on nodes based on the configured merge policy and action plan.

  • In case of n, RSCT will receive a MERGE event, CAA and RSCT will not take any action.

  • In case of h, RSCT will receive a MERGE event, and RSCT will take an action based on the configured merge policy and action plan.

Recommendation:

NA

network_fdt

Purpose: Determines the amount of time CAA would wait to give a network failure notification.
Scope: Clusterwide
Values: Default: 20
Range: 0 to 590000 0 – Interface failure detection time is 5 seconds. And quick detection mode is enabled 20-590000: Interface failure detection time is set to the value provided by user. CAA will wait for the entire timeout period before marking the interface a down (no quick failure declaration even on hardware failures).
Unit: Milliseconds
Tuning: Earlier to CAA level 7.1.4.0 networkfdt is configured as 0. With this value set, Ethernet interface is marked down immediately on hard failures. On soft failures, CAA will wait at least 5 seconds before marking an Ethernet interface down. If this is set to a value other than 0, CAA will wait for a minimum of the specified interval before marking an Ethernet interface as _down for both hard and soft failures.

Usage:

PowerHA => clmgr modify cluster NETWORK_FAILURE_DETECTION_TIME=25

CAA => clctrl -tune -o network_fdt=25000

'clmgr modify cluster' requires verification and synchronization to reflect the latest changes.

If we modify the network failure detection time using clctrl, PowerHA will automatically update the value to CAA after running verification and synchronization.

clmgr modify cluster NETWORK_FAILURE_DETECTION_TIME=25
odmget HACMPcluster
HACMPcluster:
...
handle = 0
cluster_version = 16
...
network_fdt = 20
# ssh ha2clB odmget HACMPcluster
HACMPcluster:
...
handle = 1
cluster_version = 16
...
network_fdt = 20

# clctrl -tune -L network_fdt

 NAME         DEF MIN MAX    UNIT          SCOPE ENTITY_NAME(UUID)                              CUR
 ----------------------------------------------------------------------------------------------------
 network_fdt  0   0   590000 milliseconds  c     ClusterB(50ac5846-9f73-11e6-808a-4ef945334002) 20000
 ----------------------------------------------------------------------------------------------------

# clmgr sync cluster
...
Checking for added nodes
1 tunable updated on cluster ClusterB.
1 tunable updated on cluster ClusterB.
1 tunable updated on cluster ClusterB.
...

# clctrl -tune -L network_fdt

NAME    DEF    MIN    MAX    UNIT    SCOPE   ENTITY_NAME(UUID)                              CUR
--------------------------------------------------------------------------------------------------
network_fdt 0 0 590000 milliseconds c        ClusterB(50ac5846-9f73-11e6-808a-4ef945334002) 25000
--------------------------------------------------------------------------------------------------

# odmget HACMPcluster
HACMPcluster:
...
handle = 2
...
network_fdt = 25
# ssh ha2clB odmget HACMPcluster
HACMPcluster:
...
handle = 1
...
network_fdt = 25

Command to check the current value:

PowerHA:

# clmgr query cluster | grep -i NETWORK_FAILURE_DETECTION_TIME
NETWORK_FAILURE_DETECTION_TIME="25"

CAA:

# clctrl -tune -L network_fdt

NAME        DEF MIN MAX    UNIT          SCOPE ENTITY_NAME(UUID)                                    CUR
-----------------------------------------------------------------------------------------------------------
network_fdt 0   0   590000 milliseconds  c     mem73_cluster(5bd40ebe-0303-11e7-802c-2e0c3e9dc102)  25000c
-----------------------------------------------------------------------------------------------------------
Recommendation: The appropriate value for local_network_fdt depends on the behaviour (stability) of your network and your descision on how long to ignore temporary network errors before the cluster delclares a network as down and thereafter triggers movement of resource groups. Increasing the value can cause the cluster to be less sensitive to network problems. This tunable is limited to (node_timeout – 10 seconds) For more details, refer: https://www.ibm.com/support/knowledgecenter/SSPHQG_7.2/concept/ha_concepts_ex_cluster.htm?origURL=SSPHQG_7.2.0/com.ibm.powerha.concepts/ha_concepts_ex_cluster.htm
Note: In versions before AIX 7.1 TL 4, network_fdt is set to 0 seconds. From AIX 7.1 TL4 network_fdt is set to 20 seconds. If a customer is migrating to the latest level of AIX, we might see a mismatch in PowerHA Object Data Manager (ODM) and the AIX values and verifciation errors. To avoid this, apply the following authorized program analysis report (APAR): IV83760: CLDARE ERROR FOR NODE_TIMEOUT TUNABLE AFTER AIX 71TL4 UPGRADE Note: local_network_fdt In the newer versions of AIX, post 7.1.4, local_network_fdt (in earlier versions) is replaced with network_fdt.

node_down_delay

Purpose: Controls the behavior of the node monitor which periodically evaluates the set of activated heartbeating sources to determine whether a node is up or down.
Scope: Clusterwide, per node
Values: Default is 10000
Range: 5000 – 600000
Unit: milliseconds
Tuning: Specify the number of milliseconds that the node monitor must wait after it makes the determination that another node is down before posting it as DOWN. This attribute controls the behavior of the node monitor which periodically evaluates the set of activated heart beating sources to determine whether a node is up or down. This attribute specifies the amount of time in milliseconds that the node monitor must wait after it makes the determination that another node is down before posting it as DOWN. The default value is 10. The valid range differs depending on the AIX version. Hence the value is directly taken from AIX.

Usage:

clmgr modify cluster GRACE_PERIOD=11

# clmgr list cluster | grep -w GRACE_PERIOD
GRACE_PERIOD="11"
clmgr modify cluster GRACE_PERIOD=10
# odmget HACMPclutrainster
HACMPcluster:
node_down_delay = 10
handle = 0
....
# clctrl -tune -L node_down_delay

NAME            DEF   MIN  MAX    UNIT         SCOPE   ENTITY_NAME(UUID)                                       CUR
---------------------------------------------------------------------------------------------------------------------
node_down_delay 10000 5000 600000 milliseconds c n     callisto3_cluster(2289e80e-b0a6-11e6-8003-3a0eab33be05) 11000
---------------------------------------------------------------------------------------------------------------------
# clmgr sync cluster
Committing any changes, as required, to all available nodes...
Adding any necessary PowerHA SystemMirror entries to /etc/inittab and
/etc/rc.net for IPAT on node callisto3.
Checking for added nodes
1 tunable updated on cluster callisto3_cluster.
1 tunable updated on cluster callisto3_cluster.
....
...
# clctrl -tune -L node_down_delay

NAME            DEF   MIN  MAX    UNIT         SCOPE ENTITY_NAME(UUID)                                       CUR
-------------------------------------------------------------------------------------------------------------------
node_down_delay 10000 5000 600000 milliseconds c n   callisto3_cluster(2289e80e-b0a6-11e6-8003-3a0eab33be05) 10000
-------------------------------------------------------------------------------------------------------------------
# odmget HACMPcluster
handle = 1
....
node_down_delay = 7
...
Recommendation: Default is set to 10 seconds and it defines the overall wait time before a node is declared to be dead.

node_timeout

Purpose: Controls interval for the node monitor and deadman timer.
Scope: Clusterwide, per node
Values: Default: 30000 milliseconds from CAA level 7.1.4.0
Range: 10000 – 600000
Unit: milliseconds
Tuning: This attribute controls the frequency with which the node performs heartbeat across the various enabled heartbeating sources. It is stated in terms of n, the aggregated heartbeat message sent by gateway servers in linked clusters as well as the interval based heartbeat algorithms that tick over storage area network (SAN) and disk. The default value is 30 from CAA level 7.1.4.0. The valid range differs depending on the AIX version you use. Hence the value is directly taken from AIX. For a standard cluster, link_timeout would be 0.

Usage:

clmgr modify cluster HEARTBEAT_FREQUENCY=30
/usr/sbin/clctrl -tune -o node_timeout=30000

clmgr requires a sync to make the change effective:


 # /usr/sbin/clctrl -tune -L node_timeout

 NAME        DEF   MIN   MAX    UNIT         SCOPE  ENTITY_NAME(UUID)                              CUR
--------------------------------------------------------------------------------------------------------
node_timeout 20000 10000 600000 milliseconds c n    ClusterB(50ac5846-9f73-11e6-808a-4ef945334002) 30000
--------------------------------------------------------------------------------------------------------

 # clmgr modify cluster HEARTBEAT_FREQUENCY=40

 # /usr/sbin/clctrl -tune -L node_timeout

NAME         DEF   MIN   MAX    UNIT         SCOPE   ENTITY_NAME(UUID)                              CUR
----------------------------------------------------------------------------------------------------------
node_timeout 20000 10000 600000 milliseconds c n     ClusterB(50ac5846-9f73-11e6-808a-4ef945334002) 30000
----------------------------------------------------------------------------------------------------------

 # clmgr sync cluster
 ...
 Checking for added nodes
 1 tunable updated on cluster ClusterB.
 1 tunable updated on cluster ClusterB.
 1 tunable updated on cluster ClusterB.
 ...
 # /usr/sbin/clctrl -tune -L node_timeout

NAME         DEF   MIN   MAX    UNIT         SCOPE  ENTITY_NAME(UUID)                              CUR
---------------------------------------------------------------------------------------------------------
node_timeout 20000 10000 600000 milliseconds c n    ClusterB(50ac5846-9f73-11e6-808a-4ef945334002) 40000
---------------------------------------------------------------------------------------------------------
 # odmget HACMPcluster
 HACMPcluster:
 ...
 handle = 2
 ...
 node_timeout = 40
 ...
Recommendation: Node_timeout should be atleast 10 seconds more than network_fdt. For more details, refer: https://www.ibm.com/support/knowledgecenter/SSPHQG_7.2/concept/ha_concepts_ex_cluster.htm?origURL=SSPHQG_7.2.0/com.ibm.powerha.concepts/ha_concepts_ex_cluster.htm If no heartbeats are received from the other node for more than the node failure detection time(node_timeout) then the remote node is recognized as _down. But before posting the DOWN event, it waits for node_down_delay seconds. If the repository disk (dpcom) is also down, dead man switch timeout can halt the node. In such cases, you need to increase the node failure detection timeout (node_timeout) duration to avoid dead man switch trigger. To delay triggering dead man switch timeout, we can increase the CAA dead man’s switch value (CAA node_timeout) to 120 seconds.
Note: In versions earlier to AIX 7.1 TL 4, node_timeout is set to 20 seconds by default. From AIX 7.1 TL4, node_timeout is set to 30 seconds. It should be at least 10 seconds more than network_fdt. If a customer is migrating to the latest level of AIX, we might see a mismatch in the PowerHA ODM and AIX values and verifciation errors. To avoid this, apply the following APAR: IV83760: CLDARE ERROR FOR NODE_TIMEOUT TUNABLE AFTER AIX 71TL4 UPGRADE

remote_hb_factor

Purpose: How infrequently to send a message to remote site.
Scope: Clusterwide
Values: Default: 1
Range: 1 – 100
Unit: NA
Tuning: A value of 10 means that out of every 10 gossip packets sent to local nodes, only one gossip packet is sent to the remote site nodes. A value of 1, means that for every 1 gossip packet sent to loal nodes, one gossip packet is sent to the remote site nodes.
Usage: PowerHA doesn’t recommend to change this value.
Recommendation: Do not change this value.

repos_mode

Purpose: Controls node behavior when cluster repository access is lost.
Scope: Clusterwide, per node
Values: Default: e
Range: a, e
Unit: NA
Tuning: When the value is set to a (assert), the node would crash upon losing access to the cluster repository. When the value is set to e (event), an AHAFS event is generated. A node-specific setting trumps the clusterwide setting and can be used to override the behavior on a per-node basis.
Usage: clmgr modify cluster SITE_HEARTBEAT_CYCLE=9
Recommendation: PowerHA doesn’t recommend to change this value.

site_merge_policy

Purpose:

This tunable determine the dominant site during a merge operation

Scope:

Clusterwide

Values:

Default: p

Range:

p, h, m, n

Unit:

NA

Tuning:

p = Priority

h = Heuristic

m = Manual

n = None

Split

=====

Merge

=====

None

Priority (not supported from SystemMirror release 721)

clctrl -tune -o site_merge_policy=p

None

None (supported from SystemMirror release 721)

clctrl -tune -o site_merge_policy=n

None

Majority

clctrl -tune -o site_merge_policy=h

None

Manual (not supported from SystemMirror release 721)

clctrl -tune -o site_merge_policy=m

TieBreaker

Priority (not supported from SystemMirror release 721)

clctrl -tune -o site_merge_policy=p

TieBreaker

Manual (not supported from SystemMirror release 721)

clctrl -tune -o site_merge_policy=m

TieBreaker

TieBreaker

clctrl -tune -o site_merge_policy=h

Manual

Manual (linked clusters only if AIX level older than 721)

clctrl -tune -o site_merge_policy=m

Usage:

This tunable is not allowed to change manually and is automatically changed baesd on the split and merge policy selection. By default, PowerHA sets the split policy as None and merge policy as Majority.

Recommendation:

NA

note:

Run, verify, and sync after modifying the PowerHA tunables to update the corresponding tunables in CAA. This is applicable to all the tunables.

Appendix

List of all CAA tunables

# clctrl -tune -L

NAME                      DEF    MIN    MAX    UNIT           SCOPE     ENTITY_NAME(UUID)                                           CUR
-------------------------------------------------------------------------------------------------------------------------------------------
communication_mode        m                                   c         telesto5_cluster(71a60f2e-f1f3-11e6-8003-3a0ea0daf805)       u
-------------------------------------------------------------------------------------------------------------------------------------------
config_timeout            240    0      2G-1   seconds        c n       telesto5_cluster(71a60f2e-f1f3-11e6-8003-3a0ea0daf805)       240
-------------------------------------------------------------------------------------------------------------------------------------------
deadman_mode              a                                   c n       telesto5_cluster(71a60f2e-f1f3-11e6-8003-3a0ea0daf805)       a
-------------------------------------------------------------------------------------------------------------------------------------------
link_timeout              30000  0      1171K  milliseconds   c n       telesto5_cluster(71a60f2e-f1f3-11e6-8003-3a0ea0daf805)       30000
-------------------------------------------------------------------------------------------------------------------------------------------
local_merge_policy        m                                   c         telesto5_cluster(71a60f2e-f1f3-11e6-8003-3a0ea0daf805)       m
-------------------------------------------------------------------------------------------------------------------------------------------
no_if_traffic_monitor     0      0      1                     c n       telesto5_cluster(71a60f2e-f1f3-11e6-8003-3a0ea0daf805)       0
-------------------------------------------------------------------------------------------------------------------------------------------
node_down_delay           10000  5000   600000 milliseconds   c n       telesto5_cluster(71a60f2e-f1f3-11e6-8003-3a0ea0daf805)       10000
-------------------------------------------------------------------------------------------------------------------------------------------
node_timeout              20000  10000  600000 milliseconds   c n       telesto5_cluster(71a60f2e-f1f3-11e6-8003-3a0ea0daf805)       30000
-------------------------------------------------------------------------------------------------------------------------------------------
packet_ttl                32     1      64                    c n       telesto5_cluster(71a60f2e-f1f3-11e6-8003-3a0ea0daf805)       32
-------------------------------------------------------------------------------------------------------------------------------------------
remote_hb_factor          1      1      100                   c         telesto5_cluster(71a60f2e-f1f3-11e6-8003-3a0ea0daf805)       1
-------------------------------------------------------------------------------------------------------------------------------------------
repos_mode                e                                   c n       telesto5_cluster(71a60f2e-f1f3-11e6-8003-3a0ea0daf805)       e
-------------------------------------------------------------------------------------------------------------------------------------------
site_merge_policy         p                                   c         telesto5_cluster(71a60f2e-f1f3-11e6-8003-3a0ea0daf805)       p

n/a means parameter not supported by the current platform or kernel

Scope codes:
    c = clusterwide: applies to the entire cluster
    s = per site: may be applied to one or more sites
    n = per node: may be applied to one or more nodes
    i = per interface: may be applied to one or more communication interfaces

Value conventions:
    K = Kilo: 2^10       G = Giga: 2^30       P = Peta: 2^50
    M = Mega: 2^20       T = Tera: 2^40       E = Exa: 2^60