Acknowledgement: Thanks to Asmahan Ali , Kumaran Rajaram, & Carl Zetie for their inputs.
In an organization there can be different workloads requiring different security posture. As an example, one of the workloads is a high performance computing workload dealing with non-sensitive data which does not need the data to be secure over the wire or on disk. At the same time you have another Artificial Intelligent (AI) oriented workload which is dealing with sensitive data and there is a need to have it secured over the wire as well as on disk.
IBM Spectrum Scale is a high performance clustered file storage solution which allows workloads to run over a single uniform data namespace. This prevents deployments from having isolated islands of data thus reducing the administrative overhead and consolidating the organization’s data in a more cohesive manner.
Now how does one handle the requirement of having granular and selective secure data at rest and in motion for workloads running over a clustered file system like IBM Spectrum Scale?
IBM Spectrum Scale is featured with file system encryption which allows granular policy-based encryption of data. When using this feature the data is not only encrypted on disk (satisfying the secure data at rest requirement) but also remains encrypted over the wire (satisfying the secure data in motion requirement) when data moves across the cluster to compute nodes where the workload is running. This is because, in IBM Spectrum Scale file system encryption, the file data is decrypted by GPFS clients, thus making the data flow encrypted over the network. Since file encryption is policy-based, one can select what data (which files) needs to be encrypted and what data does not need to be encrypted.
Note: IBM Spectrum Scale File system encryption only encrypts the data, but not the metadata.
For an illustration, we have two workloads “Traditional Workload” and “AI workload”. In this example “Traditional Workload” does not have requirement of secure data at rest or in motion while the “AI workload” needs to have both secure data at rest and in motion. The data-set of both the workload is different / not common.
The below high-level solution diagram shows one such deployment model of IBM Spectrum Scale that meets the above requirement. It consists of two Spectrum Scale clusters
1. Common Infrastructure IBM Spectrum Scale cluster
2. Common Compute IBM Spectrum Scale cluster
High Level Steps (as shown in below figure):
1. On the infrastructure cluster create two file systems, one for each workload.
File system A for Traditional Workload
File system B for AI Workload
2. Ensure that both the file system are remote mounted on all the nodes of the compute cluster.
3. On the infrastructure cluster configure Spectrum Scale file system encryption, and encrypt data only belonging to the AI workload (present on File system B) and not for the traditional workload residing on File system A by leveraging GPFS placement policies.
Sample example of a policy that can be applied to all files on File system B
RULE ‘myEncRule1’ ENCRYPTION ‘E1’ IS
RULE ‘Encrypt all files with rule E1’
SET ENCRYPTION ‘E1’
Note: The mmchpolicy command needs to be used to configure encryption policies which are then applied at file creation time. Existing files which were present before encryption was configured and enabled will not be encrypted. You need to copy such files into newly-created encrypted counterparts, possibly using a migration policy.
This approach will lead to:
a) Meeting the selective and granular security requirements (secure data at rest and in motion) for the two independent workloads hosted on a common infrastructure cluster
b) Less/no performance impact on the traditional workload
c) Allows the high-end server systems which are part of the compute cluster to be used for both workloads (assuming they are scheduled appropriately)
d) Having separate file systems allows the data from the independent workloads to be managed and administered separately.
Note: The solution based on file encryption does not really need to keep the file systems separate. One can achieve the solution within a single cluster & even within a single file system where the separation can be at the fileset level. Should separation be done with filesets, the SET ENCRYPTION policy rule above will need to be changed to specify a fileset name.
Some administrators will prefer doing that as they will find it more efficient to manage and administer. The solution really depends on your use case requirements.
Link to Knowledge Center
Quick reference for file encryption