This document will guide to setup EMS HA in ESS environment. By default, ESS systems ships with one EMS and it manage one or more building blocks of ESS.
We also know EMS is the only management node which exist to manage any of the IO nodes (one or more building blocks) life cycle. Outage of EMS server will become nightmare for ESS administrators.

ESS officially doesn’t support HA mode of EMS node where more than one EMS exist in one ESS environment and can perform HA role in case one of the EMS node is down. However, using this doc and running some command manually can setup ESS with EMS HA.
Doing so will allow management to have an alternate EMS node always ready to manage building blocks at any given point of time and overcome one of the EMS node failure.

To achieve this user must follow either one of the below two kinds of setup.
a) Two EMS server running to manager single ESS environment and both are UP and RUNNING.
b) Two EMS server configures to manager same ESS environment however only one EMS node will up and running at a given time.
This doc has been created and tested using LE environment. BE environment might need some more or different steps.

GUI Consideration:
Only one server can act as a GUI server at a given point of time.


Two EMS server running to manager single ESS environment and both are UP and RUNNING:

In this kind of setup, we assume two EMS server ems1 and ems2 is UP and RUNNING and having different IPs configured on EMSes (different xCAT Mgmt IP, different FSP IP and different High-Speed IP for GPFS) and both EMSes will be participating in same GPFS cluster at any point of time. See Annexure A and Annexure B for em1 and ems2 important configuration file settings.
xCAT Mgmt IPs

192.168.45.20 ems1.gpfs.net ems1 # EMS1
192.168.45.21 io1 io1.gpfs.net io1 # GSSp Node 1
192.168.45.22 io2.gpfs.net io2 # GSSp Node 2
192.168.45.25 ems2.gpfs.net ems2 # EMS2

# 10 G Network
13.10.55.10 ems1-10g.gpfs.net ems1-10g # EMS1
13.10.55.11 io1-10g.gpfs.net io1-10g # GSSp Node 1
13.10.55.12 io2-10g.gpfs.net io2-10g # GSSp Node 2
13.10.55.24 ems2-10g.gpfs.net ems2-10g # EMS2

# FSP IP Address and Server Serial
10.0.0.3 ems1-fsp.gpfs.net ems1-fsp # 2128ECA
10.0.0.2 io1-fsp.gpfs.net io1-fsp # 212900A
10.0.0.4 io2-fsp.gpfs.net io2-fsp # 2128FEA
10.0.0.50 ems2-fsp.gpfs.net ems2-fsp # 212F76A

# FSP IP Assigned to Node
10.0.0.6 ems1-fsp-if.gpfs.net ems1-fsp-if
10.0.0.5 ems2-fsp-if.gpfs.net ems2-fsp-if

This setup will allow administrator to issue any IO nodes admin commands and manage ESS cluster from any of the EMS node.
One thing to remember here xCAT service and DHCP services should be running either of one of the EMS node. If admin is choosing ems2 to manage the ESS environment, where currently ESS environment is manager my ems1, then user must stop xCAT and DHCP service on ems1 and start xCAT and DHCP service on ems2 before issuing any admin command from ems2.
Here are the steps user can follow to have two active EMS servers in one ESS environment:

a) User must deploy first ESS environment using ems1 by following the ESS quick deployment guide. This will be very common setup when system first deployed by LBS team. Here the cluster will looks like below:

[root@ems1 ~]# mmlscluster

GPFS cluster information
========================
GPFS cluster name: SUMIT.gpfs.net
GPFS cluster id: 930247251650744269
GPFS UID domain: SUMIT.gpfs.net
Remote shell command: /usr/bin/ssh
Remote file copy command: /usr/bin/scp
Repository type: CCR

Node Daemon node name IP address Admin node name Designation
----------------------------------------------------------------------
1 io1-10g.gpfs.net 13.10.55.11 io1-10g.gpfs.net quorum-manager
2 io2-10g.gpfs.net 13.10.55.12 io2-10g.gpfs.net quorum-manager
3 ems1-10g.gpfs.net 13.10.55.10 ems1-10g.gpfs.net quorum

b) Take backup of xCAT DB at ems1 using “/var/tmp/gssdeloy -c -r /var/tmp/SumitDB”. It will ask you to cleanup the xCAT but make sure you just take the copy of the xCAT and exit the script. DO NOT RUN CLEANUP. As we can see I exit immediately when xCAT DB back was done.
[root@ems1 tmp]# ./gssdeploy -c -r /var/tmp/SumitDB
Saving xcat db to /var/tmp/SumitDB/xcatdb
Backup Complete.
Saving hostkeys to /var/tmp/SumitDB/hostkeys

You have selected to cleanup (remove) xCAT and ESS toolkit from this system

Cleanup will remove xCAT and ESS components and associated
repos. It may inadvertently make some of your application
inoperative if they are dependent on those components and repos.

Press c and hit Enter to continue with cleanup.
Press e and hit Enter to exit
Enter response: e
Exiting...

c) Once first BB using ems1 setup has been complete and backup has been taken, then user can start deploying ems2 using ESS QDG. Make sure you just need to deploy ems2 using same gssdeloy.cfg file from ems1 with different ems hostname and FSP interfaces.
a. On ems2 first take the same version of ESS build (I am taking ESS 5.3.1.1) and extract the build.
b. After build extraction run “/var/tmp/gssinstall_ppc64le -u” to create yum repos for ESS binaries. Once repo has been created, copy gssdeloy.cfg from ems1 to ems2 and edit it.Be careful about below variables andit must be changed:
i. EMS_HOSTNAME=”ems1″ to EMS_HOSTNAME=”ems2″
ii. EMS_MGTNETINTERFACE=”enP3p9s0f0″ to correct one of ems2. In my case xCAT mgmt interface for ems1 and ems2 is same.
iii. SERVERS_SERIAL and SERVERS_NODES should same as ems1 on ems2
iv. FSP_MGTNETINTERFACE should be set to correct FSP interface. In my example FSP_MGTNETINTERFACE=”enP3p9s0f2″ is for ems2.
c. Copy xCAT DB backup from ems1 to ems2 and put it in /var/tmp folder.
d. Now run “gssdeploy -r /var/tmp/SumitDB” -> Restore will help to setup the base xCAT installation on ems2 and it will restore the xCAT from ems1 to ems2. During execution of command user may find error but they can be safely ignored (as we will fix it in later stage). It will be advisable not to run gssdeploy in silent mode. Must run interactively to catch error.
[STEP]: xCAT Restore 5 of 9, Make Console server
[CMD]: => makeconservercf gss_ppc64
[LAST CMD]: => cat /opt/ibm/gss/xcat/stanza/*.stanza | chdef -z

Enter 'r' to run [CMD]:
Enter 'l' to rerun [LAST CMD].
Enter 's' skip this step, or 'e' to exit this script

Enter response: r
[CMD_RESP]: Error: Unable to dispatch hierarchical sub-command to 192.168.45.20:3001. Error: Connection failure: SSL connect attempt failed with unknown error error:0407006A:rsa routines:RSA_padding_check_PKCS1_type_1:block type is not 01 error:04067072:rsa routines:RSA_EAY_PUBLIC_DECRYPT:padding check failed error:0D0C5006:asn1 encoding routines:ASN1_item_verify:EVP lib error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed at /opt/xcat/lib/perl/xCAT/Client.pm line 282.
[CMD_RESP]: .
[CMD_RESP]: Error: Unable to dispatch hierarchical sub-command to 192.168.45.20:3001. Error: Connection failure: SSL connect attempt failed with unknown error error:0407006A:rsa routines:RSA_padding_check_PKCS1_type_1:block type is not 01 error:04067072:rsa routines:RSA_EAY_PUBLIC_DECRYPT:padding check failed error:0D0C5006:asn1 encoding routines:ASN1_item_verify:EVP lib error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed at /opt/xcat/lib/perl/xCAT/Client.pm line 282.
[CMD_RESP]: .
[CMD_RESP]: RC: 1
ERROR running [CMD]: makeconservercf gss_ppc64

Above error can be ignored.
e. Now edit site table of xCAT using “tabedit site”. If tabedit command didn’t work then logff and logon to ems2 server again. Change the below value:
"master","192.168.45.20",,
"nameservers","192.168.45.20",,
"dhcpinterfaces","ems1 | enP3p9s0f0 ",,
To
"master","192.168.45.25",,
"nameservers","192.168.45.25",,
"dhcpinterfaces","ems2 | enP3p9s0f0 ",,

Once changes are done restart xCAT daemon on ems2 using “systemctl restart xcatd”
f. Now create ems2.object file using below content and create xCAT object using this file. We need to be very careful while creating ems2.oject file. “bmc” should be FSP IO of ems2, “serial” should be correct serial number of ems2 server.
[root@ems2 ~]# cat ems2.object
ems2:
objtype=node
bmc=10.0.0.50
bmcpassword=PASSW0RD
cons=ipmi
groups=__mgmtnode,ems
mgt=ipmi
mtm=8247-21L
postbootscripts=otherpkgs
postscripts=syslog,remoteshell,syncfiles
serial=212F76A
setuptftp=yes

[root@ems2 ~]# cat ems2.object | chdef -z -p
Error: ems2: postscripts 'syslog' is already included in the 'xcatdefaults'.
Error: ems2: postscripts 'remoteshell' is already included in the 'xcatdefaults'.
Error: ems2: postscripts 'syncfiles' is already included in the 'xcatdefaults'.
Error: ems2: postbootscripts 'otherpkgs' is already included in the 'xcatdefaults'.
1 object definitions have been created or modified.
New object definitions 'ems2' have been created.

g. Now we have created ems2 object in ems2 xcat DB. We need this step as xCAT DB at ems1 doesn’t have ems2 xCAT Object. Please see Annexure – B for different xCAT objects on ems2.
h. Now create consoles servers for gss_pp64 from ems2 (as this step was failed in restore). Make sure you must stop xcat service on ems1 other rcons will not from ems2, as at a time only one system can be console server master.
[root@ems2 ~]# makeconservercf gss_ppc64
[root@ems2 ~]# rcons io1
[Enter `^Ec?' for help]

Red Hat Enterprise Linux Server 7.4 (Maipo)
Kernel 3.10.0-693.35.1.el7.ppc64le on an ppc64le

io1 login:

i. Now you must bring ems2 to same kernel, system and patch level using updatenode commands. Before that create kernel and patch repo then run updatenode.
i. [root@ems2 5.3.1.1_Patch]# /var/tmp/gssdeploy -k kernel-RHBA-2018-2158-LE.tar.gz -p netmanager-2018-1755-LE.tar.gz,systemd-RHBA-2018-1151-LE.tar.gz
ii. [root@ems2 ~]# updatenode ems2 -P gss_updatenode (run this twice between each reboot)
iii. [root@ems2 ~]# updatenode ems2 -P gss_ofed
iv. [root@ems2 ~]# updatenode ems2 -P gss_ipraid

j. Once ems2 update to same kernel and patch level, create network bond for gpfs daemon network (if required) and add into to exiting gpfs cluster using gssaddnode command. After gssaddnode the cluster will look like below:
[root@ems2 ~]# mmgetstate -a

Node number Node name GPFS state
-------------------------------------------
1 io1-10g active
2 io2-10g active
3 ems1-10g active
5 ems2-10g active

[root@ems2 ~]# mmlscluster

GPFS cluster information
========================
GPFS cluster name: SUMIT.gpfs.net
GPFS cluster id: 930247251650744269
GPFS UID domain: SUMIT.gpfs.net
Remote shell command: /usr/bin/ssh
Remote file copy command: /usr/bin/scp
Repository type: CCR

Node Daemon node name IP address Admin node name Designation
------------------------------------------------------------------------
1 io1-10g.gpfs.net 13.10.55.11 io1-10g.gpfs.net quorum-manager
2 io2-10g.gpfs.net 13.10.55.12 io2-10g.gpfs.net quorum-manager
3 ems1-10g.gpfs.net 13.10.55.10 ems1-10g.gpfs.net quorum
5 ems2-10g.gpfs.net 13.10.55.24 ems2-10g.gpfs.net quorum

[root@ems2 ~]# lsdef
ems1 (node)
ems2 (node)
io1 (node)
io2 (node)

k. Now ems2 is a part of cluster and can manage xCAT cluster also.

Annexure – A : EMS1 important file configuration
gssdeloy.cfg on ems1
[root@ems1 ~]# cat /var/tmp/gssdeploy.cfg
###########################################################################
#
# Customize/change following to your environment
#__version__=”__VERSION__”
###########################################################################
#[DEPLOYMENT_TYPE] # Set the Deployment type
# “ESS”: Deploy ESS EMS node and IO nodes
# “CES”: Deploy new and very first CES node(s) for an existing
# ESS GPFS cluster not running protocol stack at all.
# “ADD_BB”: Deploy and add new ESS building block of IO nodes into
# existing GPFS cluster.
# “ADD_CES”: Add new CES nodes to an existing ESS cluster that
# already has CES nodes configured in xCAT
#
# See GSS_GROUP section for xCAT group name consideration for
# deployment types.
DEPLOYMENT_TYPE=”ESS”

#[GSS_GROUP] # xCAT group name which will contain nodes listed in SERVERS_NODES
# Default GSS group name should be “gss_ppc64” for DEPLOYMENT_TYPE “ESS”
# Default GSS group name should be “ces_ppc64” for DEPLOYMENT_TYPE “CES”
#
# Uncomment GSS_GROUP and use something new other than “gss_ppc64” or “ces_ppc64”
# in case trying to add new IO node (ADD_BB) or new CES node (ADD_CES).
#
# When adding new ESS building blocks or CES nodes (first time or additional)
# use the gssdeploy -o flow. Carefully consult the Quick Deployment Guide (QDG)
# before attempting any install, upgrade, or add of nodes.
#
# In case of ADD_BB or ADD_CES a new GSS_GROUP will need to be created and
# the nodes will be placed in this group. When done with adding more IO or CES
# nodes, it is advisable to move the nodes from this new group to the
# “gss_ppc64” or “ces_ppc64″ group (depending on deployment type) so all
# IO or CES nodes in the cluster are in the same group.
#
# Remember to carefully consult the QDG before attempting any operation.
#
# GSS_GROUP=”ces_ppc64″

#[RHEL] # Set to Y if RHEL DVD is used otherwise iso is assumed.
RHEL_USE_DVD=”N”

# Device location of RHEL DVD used instead of iso
RHEL_DVD=”/dev/cdrom”

# Mount point to use for RHEL media.
RHEL_MNT=”/opt/ibm/gss/mnt”

# Directory containing ISO.
RHEL_ISODIR=”/opt/ibm/gss/iso”

#[EMS] # Hostname of EMS
EMS_HOSTNAME=”ems1″

# Network interface for xCAT management network
EMS_MGTNETINTERFACE=”enP3p9s0f0″

#[SERVERS] # Default userid of IO or CES Server.
SERVERS_UID=”root”

# Default password of IO and CES Server.
#
# To generate an encrypted password at the highest level supported
# by the crypt module use:
#
# python -c ‘import crypt,getpass; print crypt.crypt(getpass.getpass())’
#
# Note: Password must be contained in single quotes.
#
SERVERS_PASSWD=’cluster’

# Array of IO and CES servers to provision and deploy.
# You can get serial numbers of the nodes using
# -f option of the gssdeploy. You can also use -i to
# identify the server once you know the IP address
# of FPSs using -f option.
SERVERS_SERIAL=(212900A 2128FEA)
SERVERS_NODES=(io1 io2)

#[DEPLOY] # Name ISO file
RHEL_ISO=”rhel-server-7.4-ppc64le.iso”

# Architecture (e.g. ppc64le) stated in the
# DEPLOY_OSIMAGE is used to determine target architecture
# for deployment.
# Note: Possible value for the DEPLOY_OSIMAGE can be
# either “rhels7.4-ppc64le-install-gss” or “rhels7.4-ppc64le-install-ces”.
# “rhels7.4-ppc64le-install-gss”: For IO node deployment.
# “rhels7.4-ppc64le-install-ces”: For CES node deployment.
DEPLOY_OSIMAGE=”rhels7.4-ppc64le-install-gss”

#[FSP] FSP_MGTNETINTERFACE=”enP3p9s0f3″
FSP_PASSWD=”PASSW0RD”
# End of customization

xCAT Object of nodes on EMS node ems1:
[root@ems1 ~]# lsdef ems1
Object name: ems1
bmc=10.0.0.3
bmcpassword=PASSW0RD
cons=ipmi
groups=__mgmtnode,ems
mgt=ipmi
mtm=8247-21L
postbootscripts=otherpkgs
postscripts=syslog,remoteshell,syncfiles
serial=2128ECA
setuptftp=yes
updatestatus=synced
updatestatustime=11-15-2018 00:09:36

[root@ems1 ~]# lsdef io1
lObject name: io1
addkcmdline=modprobe.blacklist=mpt3sas R::log_buf_len=4M
arch=ppc64le
bmc=10.0.0.2
bmcpassword=PASSW0RD
chain=runcmd=fspipsetup.sh
cons=ipmi
cpucount=160
cputype=POWER8E (raw), altivec supported
currchain=boot
currstate=boot
disksize=sda:264GB,sdb:1118GB,sdc:1118GB,sdd:1118GB,sde:1118GB,sdf:1118GB,sdg:1118GB,sdh:1118GB,sdi:1118GB,sdj:1118GB,sdk:1118GB,sdl:1118GB,sdm:1118GB,sdn:1118GB,sdo:1118GB,sdp:1118GB,sdq:1118GB,sdr:1118GB,sds:1118GB,sdt:1118GB,sdu:1118GB,sdv:1118GB,sdw:1118GB,sdx:1118GB,sdy:1118GB,sdz:373GB,sdaa:1118GB,sdab:1118GB,sdac:1118GB,sdad:1118GB,sdae:1118GB,sdaf:1118GB,sdag:1118GB,sdah:1118GB,sdai:1118GB,sdaj:1118GB,sdak:1118GB,sdal:1118GB,sdam:1118GB,sdan:1118GB,sdao:1118GB,sdap:1118GB,sdaq:1118GB,sdar:1118GB,sdas:1118GB,sdat:1118GB,sdau:1118GB,sdav:1118GB,sdaw:373GB,sdax:1118GB,sday:1118GB,sdaz:1118GB,sdba:1118GB,sdbb:1118GB,sdbc:1118GB,sdbd:1118GB,sdbe:1118GB,sdbf:1118GB,sdbg:1118GB,sdbh:1118GB,sdbi:1118GB,sdbj:1118GB,sdbk:1118GB,sdbl:1118GB,sdbm:1118GB,sdbn:1118GB,sdbo:1118GB,sdbp:1118GB,sdbq:1118GB,sdbr:1118GB,sdbs:1118GB,sdbt:1118GB,sdbu:1118GB,sdbv:373GB,sdbw:1118GB,sdbx:1118GB,sdby:1118GB,sdbz:1118GB,sdca:1118GB,sdcb:1118GB,sdcc:1118GB,sdcd:1118GB,sdce:1118GB,sdcf:1118GB,sdcg:1118GB,sdch:1118GB,sdci:1118GB,sdcj:1118GB,sdck:1118GB,sdcl:1118GB,sdcm:1118GB,sdcn:1118GB,sdco:1118GB,sdcp:1118GB,sdcq:1118GB,sdcr:1118GB,sdcs:373GB
groups=all,gss_ppc64
hostnames=io1-enP5p9s0f3
installnic=mac
mac=98:be:94:00:30:b4
memory=130612MB
mgt=ipmi
mtm=8247-22L
netboot=petitboot
nodetype=mp
os=rhels7.4
postbootscripts=otherpkgs
postscripts=syslog,remoteshell,syncfiles
power=ipmi
primarynic=mac
profile=gss
provmethod=rhels7.4-ppc64le-install-gss
serial=212900A
status=booted
statustime=11-15-2018 00:56:22
supportedarchs=ppc64

[root@ems1 ~]# lsdef io2
Object name: io2
addkcmdline=modprobe.blacklist=mpt3sas R::log_buf_len=4M
arch=ppc64le
bmc=10.0.0.4
bmcpassword=PASSW0RD
chain=runcmd=fspipsetup.sh
cons=ipmi
cpucount=160
cputype=POWER8E (raw), altivec supported
currchain=boot
currstate=boot
disksize=sda:264GB,sdb:1118GB,sdc:1118GB,sdd:1118GB,sde:1118GB,sdf:1118GB,sdg:1118GB,sdh:1118GB,sdi:1118GB,sdj:1118GB,sdk:1118GB,sdl:1118GB,sdm:1118GB,sdn:1118GB,sdo:1118GB,sdp:1118GB,sdq:1118GB,sdr:1118GB,sds:1118GB,sdt:1118GB,sdu:1118GB,sdv:1118GB,sdw:1118GB,sdx:1118GB,sdy:1118GB,sdz:373GB,sdaa:1118GB,sdab:1118GB,sdac:1118GB,sdad:1118GB,sdae:1118GB,sdaf:1118GB,sdag:1118GB,sdah:1118GB,sdai:1118GB,sdaj:1118GB,sdak:1118GB,sdal:1118GB,sdam:1118GB,sdan:1118GB,sdao:1118GB,sdap:1118GB,sdaq:1118GB,sdar:1118GB,sdas:1118GB,sdat:1118GB,sdau:1118GB,sdav:1118GB,sdaw:373GB,sdax:1118GB,sday:1118GB,sdaz:1118GB,sdba:1118GB,sdbb:1118GB,sdbc:1118GB,sdbd:1118GB,sdbe:1118GB,sdbf:1118GB,sdbg:1118GB,sdbh:1118GB,sdbi:1118GB,sdbj:1118GB,sdbk:1118GB,sdbl:1118GB,sdbm:1118GB,sdbn:1118GB,sdbo:1118GB,sdbp:1118GB,sdbq:1118GB,sdbr:1118GB,sdbs:1118GB,sdbt:1118GB,sdbu:1118GB,sdbv:373GB,sdbw:1118GB,sdbx:1118GB,sdby:1118GB,sdbz:1118GB,sdca:1118GB,sdcb:1118GB,sdcc:1118GB,sdcd:1118GB,sdce:1118GB,sdcf:1118GB,sdcg:1118GB,sdch:1118GB,sdci:1118GB,sdcj:1118GB,sdck:1118GB,sdcl:1118GB,sdcm:1118GB,sdcn:1118GB,sdco:1118GB,sdcp:1118GB,sdcq:1118GB,sdcr:1118GB,sdcs:373GB
groups=all,gss_ppc64
installnic=mac
mac=98:be:94:00:4f:14
memory=130612MB
mgt=ipmi
mtm=8247-22L
netboot=petitboot
nodetype=mp
os=rhels7.4
postbootscripts=otherpkgs
postscripts=syslog,remoteshell,syncfiles
power=ipmi
primarynic=mac
profile=gss
provmethod=rhels7.4-ppc64le-install-gss
serial=2128FEA
status=booted
statustime=11-15-2018 00:56:18
supportedarchs=ppc64

[root@ems1 tmp]# lsdef -t group __mgmtnode,all,ems,gss_ppc64
Object name: __mgmtnode
members=ems1
setuptftp=yes
Object name: all
members=io1,io2
Object name: ems
members=ems1
Object name: gss_ppc64
addkcmdline=modprobe.blacklist=mpt3sas R::log_buf_len=4M
arch=ppc64le
grouptype=static
installnic=mac
members=io1,io2
netboot=petitboot
power=ipmi
primarynic=mac

/etc/hosts on ems1:
[root@ems1 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

# xCAT Mgmt Network
192.168.45.20 ems1.gpfs.net ems1 # EMS1
192.168.45.21 io1 io1.gpfs.net # GSSp Node 1
192.168.45.22 io2.gpfs.net io2 # GSSp Node 2
192.168.45.25 ems2.gpfs.net ems2 # EMS2

# 10 G Network
13.10.55.10 ems1-10g.gpfs.net ems1-10g # EMS1
13.10.55.11 io1-10g.gpfs.net io1-10g # GSSp Node 1
13.10.55.12 io2-10g.gpfs.net io2-10g # GSSp Node 2
13.10.55.24 ems2-10g.gpfs.net ems2-10g # EMS2

# FSP IP Address
10.0.0.3 ems1-fsp.gpfs.net ems1-fsp # 2128ECA
10.0.0.2 io1-fsp.gpfs.net io1-fsp # 212900A
10.0.0.4 io2-fsp.gpfs.net io2-fsp # 2128FEA
10.0.0.50 ems2-fsp.gpfs.net ems2-fsp # 212F76A

# FSP IP Assigned to Node
10.0.0.6 ems1-fsp-if.gpfs.net ems1-fsp-if
10.0.0.5 ems2-fsp-if.gpfs.net ems2-fsp-if


Annexure – B : EMS2 important file configuration

gssdeloy.cfg on ems2
[root@ems2 tmp]# cat gssdeploy.cfg
###########################################################################
#
# Customize/change following to your environment
#__version__=”__VERSION__”
###########################################################################
#[DEPLOYMENT_TYPE] # Set the Deployment type
# “ESS”: Deploy ESS EMS node and IO nodes
# “CES”: Deploy new and very first CES node(s) for an existing
# ESS GPFS cluster not running protocol stack at all.
# “ADD_BB”: Deploy and add new ESS building block of IO nodes into
# existing GPFS cluster.
# “ADD_CES”: Add new CES nodes to an existing ESS cluster that
# already has CES nodes configured in xCAT
#
# See GSS_GROUP section for xCAT group name consideration for
# deployment types.
DEPLOYMENT_TYPE=”ESS”

#[GSS_GROUP] # xCAT group name which will contain nodes listed in SERVERS_NODES
# Default GSS group name should be “gss_ppc64” for DEPLOYMENT_TYPE “ESS”
# Default GSS group name should be “ces_ppc64” for DEPLOYMENT_TYPE “CES”
#
# Uncomment GSS_GROUP and use something new other than “gss_ppc64” or “ces_ppc64”
# in case trying to add new IO node (ADD_BB) or new CES node (ADD_CES).
#
# When adding new ESS building blocks or CES nodes (first time or additional)
# use the gssdeploy -o flow. Carefully consult the Quick Deployment Guide (QDG)
# before attempting any install, upgrade, or add of nodes.
#
# In case of ADD_BB or ADD_CES a new GSS_GROUP will need to be created and
# the nodes will be placed in this group. When done with adding more IO or CES
# nodes, it is advisable to move the nodes from this new group to the
# “gss_ppc64” or “ces_ppc64″ group (depending on deployment type) so all
# IO or CES nodes in the cluster are in the same group.
#
# Remember to carefully consult the QDG before attempting any operation.
#
# GSS_GROUP=”ces_ppc64″

#[RHEL] # Set to Y if RHEL DVD is used otherwise iso is assumed.
RHEL_USE_DVD=”N”

# Device location of RHEL DVD used instead of iso
RHEL_DVD=”/dev/cdrom”

# Mount point to use for RHEL media.
RHEL_MNT=”/opt/ibm/gss/mnt”

# Directory containing ISO.
RHEL_ISODIR=”/opt/ibm/gss/iso”

#[EMS] # Hostname of EMS
EMS_HOSTNAME=”ems2″

# Network interface for xCAT management network
EMS_MGTNETINTERFACE=”enP3p9s0f0″

#[SERVERS] # Default userid of IO or CES Server.
SERVERS_UID=”root”

# Default password of IO and CES Server.
#
# To generate an encrypted password at the highest level supported
# by the crypt module use:
#
# python -c ‘import crypt,getpass; print crypt.crypt(getpass.getpass())’
#
# Note: Password must be contained in single quotes.
#
SERVERS_PASSWD=’cluster’

# Array of IO and CES servers to provision and deploy.
# You can get serial numbers of the nodes using
# -f option of the gssdeploy. You can also use -i to
# identify the server once you know the IP address
# of FPSs using -f option.
SERVERS_SERIAL=(212900A 2128FEA)
SERVERS_NODES=(io1 io2)

#[DEPLOY] # Name ISO file
RHEL_ISO=”rhel-server-7.4-ppc64le.iso”

# Architecture (e.g. ppc64le) stated in the
# DEPLOY_OSIMAGE is used to determine target architecture
# for deployment.
# Note: Possible value for the DEPLOY_OSIMAGE can be
# either “rhels7.4-ppc64le-install-gss” or “rhels7.4-ppc64le-install-ces”.
# “rhels7.4-ppc64le-install-gss”: For IO node deployment.
# “rhels7.4-ppc64le-install-ces”: For CES node deployment.
DEPLOY_OSIMAGE=”rhels7.4-ppc64le-install-gss”

#[FSP] FSP_MGTNETINTERFACE=”enP3p9s0f2″
FSP_PASSWD=”PASSW0RD”
# End of customization

/etc/hosts on ems2:
[root@ems1 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

# xCAT Mgmt Network
192.168.45.20 ems1.gpfs.net ems1 # EMS1
192.168.45.21 io1 io1.gpfs.net # GSSp Node 1
192.168.45.22 io2.gpfs.net io2 # GSSp Node 2
192.168.45.25 ems2.gpfs.net ems2 # EMS2

# 10 G Network
13.10.55.10 ems1-10g.gpfs.net ems1-10g # EMS1
13.10.55.11 io1-10g.gpfs.net io1-10g # GSSp Node 1
13.10.55.12 io2-10g.gpfs.net io2-10g # GSSp Node 2
13.10.55.24 ems2-10g.gpfs.net ems2-10g # EMS2

# FSP IP Address
10.0.0.3 ems1-fsp.gpfs.net ems1-fsp # 2128ECA
10.0.0.2 io1-fsp.gpfs.net io1-fsp # 212900A
10.0.0.4 io2-fsp.gpfs.net io2-fsp # 2128FEA
10.0.0.50 ems2-fsp.gpfs.net ems2-fsp # 212F76A

# FSP IP Assigned to Node
10.0.0.6 ems1-fsp-if.gpfs.net ems1-fsp-if
10.0.0.5 ems2-fsp-if.gpfs.net ems2-fsp-if

xCAT Object of nodes on EMS node ems2: It include manually created ems2 object which was not presented on ems1.
[root@ems2 ~]# lsdef ems1
Object name: ems1
bmc=10.0.0.3
bmcpassword=PASSW0RD
cons=ipmi
groups=__mgmtnode,ems
mgt=ipmi
mtm=8247-21L
postbootscripts=otherpkgs
postscripts=syslog,remoteshell,syncfiles
serial=2128ECA
setuptftp=yes
updatestatus=synced
updatestatustime=11-15-2018 00:09:36

[root@ems2 ~]# lsdef io1
Object name: io1
addkcmdline=modprobe.blacklist=mpt3sas R::log_buf_len=4M
arch=ppc64le
bmc=10.0.0.2
bmcpassword=PASSW0RD
chain=runcmd=fspipsetup.sh
cons=ipmi
cpucount=160
cputype=POWER8E (raw), altivec supported
currchain=boot
currstate=boot
disksize=sda:264GB,sdb:1118GB,sdc:1118GB,sdd:1118GB,sde:1118GB,sdf:1118GB,sdg:1118GB,sdh:1118GB,sdi:1118GB,sdj:1118GB,sdk:1118GB,sdl:1118GB,sdm:1118GB,sdn:1118GB,sdo:1118GB,sdp:1118GB,sdq:1118GB,sdr:1118GB,sds:1118GB,sdt:1118GB,sdu:1118GB,sdv:1118GB,sdw:1118GB,sdx:1118GB,sdy:1118GB,sdz:373GB,sdaa:1118GB,sdab:1118GB,sdac:1118GB,sdad:1118GB,sdae:1118GB,sdaf:1118GB,sdag:1118GB,sdah:1118GB,sdai:1118GB,sdaj:1118GB,sdak:1118GB,sdal:1118GB,sdam:1118GB,sdan:1118GB,sdao:1118GB,sdap:1118GB,sdaq:1118GB,sdar:1118GB,sdas:1118GB,sdat:1118GB,sdau:1118GB,sdav:1118GB,sdaw:373GB,sdax:1118GB,sday:1118GB,sdaz:1118GB,sdba:1118GB,sdbb:1118GB,sdbc:1118GB,sdbd:1118GB,sdbe:1118GB,sdbf:1118GB,sdbg:1118GB,sdbh:1118GB,sdbi:1118GB,sdbj:1118GB,sdbk:1118GB,sdbl:1118GB,sdbm:1118GB,sdbn:1118GB,sdbo:1118GB,sdbp:1118GB,sdbq:1118GB,sdbr:1118GB,sdbs:1118GB,sdbt:1118GB,sdbu:1118GB,sdbv:373GB,sdbw:1118GB,sdbx:1118GB,sdby:1118GB,sdbz:1118GB,sdca:1118GB,sdcb:1118GB,sdcc:1118GB,sdcd:1118GB,sdce:1118GB,sdcf:1118GB,sdcg:1118GB,sdch:1118GB,sdci:1118GB,sdcj:1118GB,sdck:1118GB,sdcl:1118GB,sdcm:1118GB,sdcn:1118GB,sdco:1118GB,sdcp:1118GB,sdcq:1118GB,sdcr:1118GB,sdcs:373GB
groups=all,gss_ppc64
hostnames=io1-enP5p9s0f3
installnic=mac
mac=98:be:94:00:30:b4
memory=130612MB
mgt=ipmi
mtm=8247-22L
netboot=petitboot
nodetype=mp
os=rhels7.4
postbootscripts=otherpkgs
postscripts=syslog,remoteshell,syncfiles
power=ipmi
primarynic=mac
profile=gss
provmethod=rhels7.4-ppc64le-install-gss
serial=212900A
status=booted
statustime=11-15-2018 00:56:22
supportedarchs=ppc64

[root@ems2 ~]# lsdef io2
Object name: io2
addkcmdline=modprobe.blacklist=mpt3sas R::log_buf_len=4M
arch=ppc64le
bmc=10.0.0.4
bmcpassword=PASSW0RD
chain=runcmd=fspipsetup.sh
cons=ipmi
cpucount=160
cputype=POWER8E (raw), altivec supported
currchain=boot
currstate=boot
disksize=sda:264GB,sdb:1118GB,sdc:1118GB,sdd:1118GB,sde:1118GB,sdf:1118GB,sdg:1118GB,sdh:1118GB,sdi:1118GB,sdj:1118GB,sdk:1118GB,sdl:1118GB,sdm:1118GB,sdn:1118GB,sdo:1118GB,sdp:1118GB,sdq:1118GB,sdr:1118GB,sds:1118GB,sdt:1118GB,sdu:1118GB,sdv:1118GB,sdw:1118GB,sdx:1118GB,sdy:1118GB,sdz:373GB,sdaa:1118GB,sdab:1118GB,sdac:1118GB,sdad:1118GB,sdae:1118GB,sdaf:1118GB,sdag:1118GB,sdah:1118GB,sdai:1118GB,sdaj:1118GB,sdak:1118GB,sdal:1118GB,sdam:1118GB,sdan:1118GB,sdao:1118GB,sdap:1118GB,sdaq:1118GB,sdar:1118GB,sdas:1118GB,sdat:1118GB,sdau:1118GB,sdav:1118GB,sdaw:373GB,sdax:1118GB,sday:1118GB,sdaz:1118GB,sdba:1118GB,sdbb:1118GB,sdbc:1118GB,sdbd:1118GB,sdbe:1118GB,sdbf:1118GB,sdbg:1118GB,sdbh:1118GB,sdbi:1118GB,sdbj:1118GB,sdbk:1118GB,sdbl:1118GB,sdbm:1118GB,sdbn:1118GB,sdbo:1118GB,sdbp:1118GB,sdbq:1118GB,sdbr:1118GB,sdbs:1118GB,sdbt:1118GB,sdbu:1118GB,sdbv:373GB,sdbw:1118GB,sdbx:1118GB,sdby:1118GB,sdbz:1118GB,sdca:1118GB,sdcb:1118GB,sdcc:1118GB,sdcd:1118GB,sdce:1118GB,sdcf:1118GB,sdcg:1118GB,sdch:1118GB,sdci:1118GB,sdcj:1118GB,sdck:1118GB,sdcl:1118GB,sdcm:1118GB,sdcn:1118GB,sdco:1118GB,sdcp:1118GB,sdcq:1118GB,sdcr:1118GB,sdcs:373GB
groups=all,gss_ppc64
installnic=mac
mac=98:be:94:00:4f:14
memory=130612MB
mgt=ipmi
mtm=8247-22L
netboot=petitboot
nodetype=mp
os=rhels7.4
postbootscripts=otherpkgs
postscripts=syslog,remoteshell,syncfiles
power=ipmi
primarynic=mac
profile=gss
provmethod=rhels7.4-ppc64le-install-gss
serial=2128FEA
status=booted
statustime=11-15-2018 00:56:18
supportedarchs=ppc64

[root@ems2 ~]# lsdef ems2
Object name: ems2
bmc=10.0.0.50
bmcpassword=PASSW0RD
cons=ipmi
groups=__mgmtnode,ems
mgt=ipmi
mtm=8247-21L
postbootscripts=otherpkgs
postscripts=syslog,remoteshell,syncfiles
serial=212F76A
setuptftp=yes

[root@ems2 ~]# lsdef -t group __mgmtnode,all,ems,gss_ppc64
Object name: __mgmtnode
members=ems1,ems2
setuptftp=yes
Object name: all
members=io1,io2
Object name: ems
members=ems1,ems2
Object name: gss_ppc64
addkcmdline=modprobe.blacklist=mpt3sas R::log_buf_len=4M
arch=ppc64le
grouptype=static
installnic=mac
members=io1,io2
netboot=petitboot
power=ipmi
primarynic=mac

Join The Discussion

Your email address will not be published. Required fields are marked *