File compression is an important technology for storage products. Use of compression results in improved storage efficiency with compression typically yielding more than 2X compression ratio for compressible data. It reduces IO bandwidth consumption resulting in reduced load on the storage back-end. Further, caching compressed data on the client increases the apparent cache size.
File compression support was introduced in Spectrum Scale 4.2.0 with the addition of zlib compression algorithm for use on cold data. zlib offers a fairly good compression ratio. However, because of the on-the-fly zlib decompression, it results in slower access performance compared to accessing raw file. Thus, zlib is suitable for compressing cold data.
Subsequent Spectrum Scale releases improved the compression support with 5.0.0 release adding support for LZ4 compression algorithm. LZ4 is a much faster compression algorithm, with decompression speed up to 5 times better than zlib. It offers a lower compression ratio compared to zlib, but still maintains better than 2X ratio for most compressible files. Sequential read using LZ4 is, in most cases, faster than reading raw data and is 5X faster than with using zlib. Thus, using LZ4 would not only save storage space, but, in most cases, also see better performance.
Thus the recommendation is to use zlib for cold data. For other data and a sequential read workload, LZ4 should be used. For random reads, compression is not yet recommended for Spectrum Scale users.
Compression can be accomplished using Spectrum Scale CLI or the policy engine as follows:
mmchattr --compress [yes | no | z | lz4] filename
yes – Compressed files remain compressed (with z or lz4), uncompressed files are compressed with zlib
no – Files are uncompressed
z – Files are compressed or recompressed (from lz4) with zlib
lz4 – Files are compressed or recompessed (from z) with lz4
RULE 'COMPRESS' MIGRATE COMPRESS('libname')
RULE 'COMPRESS' MIGRATE COMPRESS('z')
RULE 'COMPRESS' MIGRATE COMPRESS('lz4')
The policy rules allow users to determine which files to compress and which compression algorithm to use. This can later be changed as the file ages or it’s access pattern changes.
Compression status and library for a file can be queried as follows:
[root@fin26v01p ~]# mmlsattr -L /gpfs/fs0/tf0
file name: /gpfs/fs0/tf0
metadata replication: 1 max 2
data replication: 1 max 2
storage pool name: datapool
fileset name: root
creation time: Fri Jun 9 15:58:07 2017
Misc attributes: ARCHIVE COMPRESSION (library lz4)
For more details on Compression, refer to the File Compression chapter in Spectrum Scale Administration guide.