Performance Impact of Big SQL with YARN Enabled

An overview of YARN integration with Big SQL was presented in a previous blog. Note that this is NOT a performance feature, if you are solely interested in improving performance then logical Big SQL workers should be chosen instead of enabling YARN for Big SQL.

When YARN is enabled for Big SQL, there is more flexibility to increase the resources for Big SQL without needing to restart the Big SQL service. Read more about Big SQL Yarn Configuration options and recommendations and how to enable YARN for Big SQL. Scheduling control is taken away from Big SQL and given to YARN and this can affect performance.

As more Big SQL YARN containers are activated, the better performance can be. But avoid trying to squeeze in more containers by decreasing the size of the Big SQL YARN container below the recommended size. Since YARN is managing the resources on the cluster, not Big SQL, there can be instances whereby YARN is not able to activate Big SQL YARN containers evenly across the Big SQL worker nodes.

The diagram below shows an example of a YARN scheduling placement where one Big SQL YARN container is activated on the 1st compute node. No Big SQL YARN containers are activated on the 2nd node and the 3rd node has 2 Big SQL YARN containers activated. A better performing placement policy would have been 1 Big SQL YARN container activated per compute node. The default placement policy for YARN tries to distribute the YARN containers across the compute nodes as much as possible.

There can still be ‘skew’ circumstances when Big SQL and YARN applications are running side by side. In instances where there is skew the performance of Big SQL queries can be affected. Note also that there can also be increased variability in query execution times depending on where and how many Big SQL YARN containers are activated.

3.1 Performance Study

Performance studies were conducted on a cluster with 1 management node and 4 compute nodes running MapReduce Teragen 1TB and TPC-DS 1 TB workload.

Cluster Specifications	YARN and Big SQL configurations
Memory: 128GB	Total YARN memory – 116 GB
CPU: Physical cores 20(40 vCPU), Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz	Total YARN vCores – 32
OS: RHEL 7.2	Big SQL container size – 28GB mem + 4 or 8 vCores
Network bandwidth: 10Gbps/Storage: 2TB SATA x 9	DFT_DEGREE- 4

When allocating the same amount of resources when YARN is not enabled vs when YARN is enabled for Big SQL there is no loss of performance for the TPC-DS 1TB workload.
Activating more Big SQL YARN containers impacts performance differently for single and 4 streams runs with TPC-DS 1TB workload. When the number of Big SQL YARN containers increases from 1->2->3->4 per host, % of performance boost for each step is:

Note that we did not observe any significant boost in performance when going from 2->3->4 Big SQL YARN containers because our workload maxed out our available CPU resources.

Any Big SQL container skew across nodes will hurt performance heavily. In tests with 8 containers across 4 worker nodes, placement(2,2,2,2) is baseline. Skewed placements result in various levels of performance degradation.

Elapsed time for deactivating Big SQL YARN containers is within 20s. Elapsed time of activating Big SQL YARN containers is proportional to how many containers are activated. Activating 3 Big SQL YARN containers per compute node required 200s in our tests.

Summary

Enabling YARN for Big SQL will not necessarily mean that performance will improve. The scheduling and activation of Big SQL YARN containers is in the control of YARN. In cases where the the Big SQL YARN containers are activated evenly across compute nodes there can be some performance benefit. If performance is very important in your environment, consider adding logical Big SQL workers as an alternative.

Thanks to the following major contributors to this work: Hebert Pereyra, Metin Kalayci, Diego Santesteban, Armando Paniagua, Xiao Wei Zhang, Abhayan Sundararajan

Tips

Performance Impact of Big SQL with YARN Enabled - Hadoop Dev

Technical Blog Post

Abstract

Body

3.1 Performance Study

Summary

UID

Share your feedback

Need support?