The dilemma of block size
The Spectrum Scale file system block size determines its unit of I/O and cannot be changed once the file system has been created. So it is important to choose the right block size for the intended data access pattern of the file system. For example, large file sequential reads benefit from large block size. But this is a difficult decision to make if the file system workload will be a mix of small and large I/O. To address this dilemma, Spectrum Scale 5.0 introduces the variant sub-blocks feature.
Every file in Spectrum Scale is made up of zero or more (non-contiguous) disk blocks. The last block of a file can be underutilized, especially for small files, causing internal fragmentation. To minimize this fragmentation, a disk block is divided into smaller units called sub-blocks. The tail of a file can be placed in a contiguous set of sub-blocks that is smaller than a full disk block.
In file systems created before Spectrum Scale version 5.0, disk blocks are divided into a fixed number of 32 sub-blocks. This means that a larger block size will result in a larger sub-block size which presents a problem for file systems with a mix of small and large files. A large block size would improve large file sequential I/O but would also lead to more internal fragmentation for smaller files on the same file system. This not only degrades disk space utilization but also increases the I/O overhead for small files.
The root of the above problem is that the smallest unit of I/O, aka a sub-block, is proportional to the largest unit of I/O, aka a block. The variant sub-blocks feature addresses this limitation.
For newer file system created with Spectrum Scale 5.0 (file system format version >= 5.0), a disk block is no longer divided into a fixed number of sub-blocks. Instead, for a given block size, an optimal sub-block size is chosen by the system. The following table illustrates this –
|Block size||Sub-block size|
|64 KiB||2 KiB|
|128 KiB||4 KiB|
|256 KiB, 512 KiB, 1 MiB, 2 MiB||8 KiB|
|4 MiB (default)||8 KiB|
|8 MiB||16 KiB|
|16 MiB (Spectrum Scale RAID only)||16 KiB|
Note that the default block size for new file systems is 4 MiB. This provides the best balance between I/O performance and disk space utilization. For more information please refer to Suggested file system block size.
A Spectrum Scale file system can be created with different data and metadata block sizes. In such cases, data and metadata can also have different sub-block sizes, but the number of sub-blocks per block would be the same for both block types. This is illustrated with the following example. Consider a file system with a data block size of 8 MiB and a metadata block size of 1 MiB. The sub-block size is calculated using the above table for the smaller of the two block sizes. In this case, the metadata block size is smaller and so the metadata sub-block size is calculated using the above table to be 8 KiB. This gives the value for the number of sub-blocks per block as (1 MiB / 8 KiB =) 128. Since this value should be same for both data and metadata blocks, it is now used to calculate the sub-block size for the data blocks which will be (8 MiB / 128 =) 64 KiB. This is an important point to consider since it is easy to make the mistake of calculating the data sub-block size directly from the above table which would give the wrong value of 16 KiB.
There is a small space overhead to variant sub-blocks feature – the block allocation map file which tracks the allocation state of all disk blocks in the file system will be a bit larger when the file system uses more than 32 sub-blocks per disk block. However, this will not impact any Spectrum Scale operations. Use ‘mmlsfs –subblocks-per-full-block’ to check for number sub-blocks per disk block.
For pre-5.0 file systems, the variant sub-blocks feature will not be available even if the file system is upgraded to the latest format version using ‘mmchfs -V full’. The way to use this feature for existing data is to create a new file system under Spectrum Scale 5.0 and copy data from the old to the new file system.
In summary, the variant sub-blocks feature solves an important dilemma for file system administrators – that of choosing the ideal file system block size – by allowing the administrators to choose a block size that is large enough to get good sequential performance while the system takes care of optimizing I/O and space usage for smaller files.
Thanks to my colleague Zheng Cai Yuan for his valued inputs on this topic