In this blog entry I will present how to set up a sample remote cluster, show common configuration errors and solutions for them. Steps to enable a dedicated high speed network for GPFS daemon communication will be covered as well. This would provide very efficient storage access to locally mounted remote filesystems.

Setup

There are two clusters prepared for the setup:

  • Local GPFS cluster ocp_gpfs.ocp4.scale.com running on Openshift 4 worker nodes.
  • Remote GPFS cluster ess3000.bda.scale.com which provides filesystems from ESS 3000.

These clusters are also connected via a high speed network 10.10.1.0. This is optional, but it can provide a better performance between them.

On local cluster ocp_gpfs.ocp4.scale.com:
If it is a new cluster and you don’t have a key under /var/mmfs/ssl/id_rsa.pub, then create it.

mmauth genkey new
mmauth update . -l AUTHONLY

Copy the key to remote cluster (renamed there to local.id_rsa.pub).

scp /var/mmfs/ssl/id_rsa.pub 192.168.1.52:/root/local.id_rsa.pub

On remote cluster ess3000.bda.scale.com:
If it is a new cluster and you don’t have a key under /var/mmfs/ssl/id_rsa.pub, then create it

mmauth genkey new
mmauth update . -l AUTHONLY

Copy the key to local cluster (renamed there to ess3000.id_rsa.pub) .

scp /var/mmfs/ssl/id_rsa.pub quorum01-gpfs:/root/ess3000.id_rsa.pub

Register local cluster.

mmauth add ocp_gpfs.ocp4.scale.com -k /root/local.id_rsa.pub

Grand write access to filesystem fs0_1m for local cluster.

mmauth grant ocp_gpfs.ocp4.scale.com -f fs0_1m

Enable high speed network 10.10.1.0 as local daemon network. Allow local cluster ocp_gpfs.ocp4.scale.com connections to the daemon network as well.

mmchconfig subnets="10.10.1.0 10.10.1.0/ocp_gpfs.ocp4.scale.com"

On local cluster ocp_gpfs.ocp4.scale.com:

Add remote cluster, use daemon node names (or their respective IP addresses) from remote cluster for contact nodes.

mmremotecluster add ess3000.bda.scale.com -n 9.155.106.66,9.155.106.122,9.155.106.123 -k /root/ess3000.id_rsa.pub

Enable high speed network 10.10.1.0 as local daemon network. Allow remote cluster ess3000.bda.scale.com connections to the daemon network as well.

mmchconfig subnets="10.10.1.0 10.10.1.0/ess3000.bda.scale.com"

Add remote filesystem fs0_1m by using a mount point /mnt/remote_fs0_1m. Name it locally remote_fs0_1m.

mmremotefs add remote_fs0_1m -f fs0_1m -C ess3000.bda.scale.com -T /mnt/remote_fs0_1m

Important! mmremotefs validates only the cluster name (as in gpfs 5.0.4). Make sure you insert a proper remote filesystem name for the command.

Mount remote filesystem.

mmmount remote_fs0_1m -N all

Check if the remote cluster has been defined properly.

mmremotecluster show

Validate on both cluster that the connection works and proper interface/IPs are used.

mmdiag --network

The process of acquiring proper entries can take a couple of minutes for the GPFS daemon. In this case, rerun the command and check if the setup works as expected.

Limitation for locally mounted remote filesystems (as in gpfs 5.0.4)

1. It is not possible to perform fileset operation such as creation, unlinking and deletion

mmcrfileset remote_fs0_1m testfset
mmcrfileset: File system remote_fs0_1m belongs to cluster ess3000.bda.scale.com.
    Command is not allowed for remote file systems.
mmcrfileset: Command failed. Examine previous error messages to determine cause.

mmunlinkfileset remote_fs0_1m testfset
mmunlinkfileset: File system remote_fs0_1m belongs to cluster ess3000.bda.scale.com.
    Command is not allowed for remote file systems.
mmunlinkfileset: Command failed. Examine previous error messages to determine cause.

mmdelfileset remote_fs0_1m testfset
mmdelfileset: File system remote_fs0_1m belongs to cluster ess3000.bda.scale.com.
    Command is not allowed for remote file systems.
mmdelfileset: Command failed. Examine previous error messages to determine cause.

2. It is not possible to enable quota

mmchfs remote_fs0_1m -Q yes
mmchfs: File system remote_fs0_1m belongs to cluster ess3000.bda.scale.com.
    Command is not allowed for remote file systems.
mmchfs: Command failed. Examine previous error messages to determine cause.

3. You cannot set theirs automount flag to yes (thus, automatic mount is not supported)

mmchfs remote_fs0_1m -A yes
mmchfs: File system remote_fs0_1m belongs to cluster ess3000.bda.scale.com.
    Command is not allowed for remote file systems.
mmchfs: Command failed. Examine previous error messages to determine cause.

Troubleshooting

Sometimes the clusters need more time to obtain proper network interfaces and IPs (as seen in mmdiag –network). You can force the refresh by restarting the gpfs daemon.

mmshutdown -a; mmshutdown -a; mmstartup -a

Error code 233 indicates connection problems between nodes. Please check if your system fulfill these requirements:

  • you need to use daemon node names(or their respective IP addresses) from remote cluster as contact nodes for local cluster
  • you need to have correct and unique entries in /etc/hosts on all cluster nodes and on your DNS (in /etc/hosts for dnsmasq)
  • you need to check if the interface used for subnet is up on every cluster node

Join The Discussion

Your email address will not be published. Required fields are marked *