IBM Spectrum Scale Sharing Nothing Cluster performance tuning guide has been posted and please refer to link before you doing the below change.

Here is the tuning steps.
Step1: Configure spark.shuffle.file.buffer
By default, this must be configured on $SPARK_HOME/conf/spark-defaults.conf.
To optimize Spark workloads on an IBM Spectrum Scale filesystem, the key tuning value to set is the ‘spark.shuffle.file.buffer’ configuration option used by Spark (defined in a spark config file) which must be set to match the block size of the IBM Spectrum Scale filesystem being used.

The user can query the size of the blocksize for an IBM Spectrum Scale filesystem by running: ‘mmlsfs -B’.

The following is an example of tuning the spark_shuffle_buffer_size for a given filesystem:
spark_shuffle_file_buffer=$(/usr/lpp/mmfs/bin/mmlsfs -B | tail -1 | awk ' { print $2} ')

#Need to set the Spark configuration option spark.shuffle.file.buffer to the value assigned to
$spark_shuffle_file_buffer

Defining a large block size for IBM Spectrum Scale filesystems used for spark shuffle operations can improve system performance. However, using a block size larger than 2M can offer useful improvements on typical hardware used in FPO configurations is not proven.

Step2: Configure spark.local.dir with local path
Do not put the Spark’s shuffle data into the IBM Spectrum Scale file system because this slows down the shuffle process.

Join The Discussion

Your email address will not be published. Required fields are marked *