Authors: Xabriel J Collazo Mojica, John Poelman, and Robin Noble
Big SQL Topology Recommendations
There are 2 different topologies that can be used when installing Big SQL. Depending on the cluster’s resources, network speed, and enabled Big SQL features will determine the best choice of topology. These recommendations are based on typical commodity hardware found in Hadoop clusters.
1. Big SQL Worker nodes and DataNodes co-located
In this topology the Big SQL Worker nodes are installed on the same nodes as the HDFS DataNodes. The Big SQL workers can be installed on all of the DataNodes.
- Reduce network traffic
- Enable YARN integration
When possible the data will be read from the local DataNode rather than a remote DataNode. This can result in better query performance by avoiding additional network traffic on a slow network.
Big SQL YARN/Slider integration can be enabled to utilize the YARN resource scheduling on each node. This allows better utilization of shared resources.
More information about YARN integration is available here:
The number of Big SQL worker nodes installed effects the amount of parallelism possible when each query or insert executes. Installing Big SQL Workers on every DataNode as well as using multiple Logical Worker nodes will increase the possible parallelism.
- Shared memory and cpu
- IO contention
Big SQL will share memory and cpu resources on the node with HDFS and possibly YARN. This reduces the amount of resources which can be assigned to Big SQL. The default is 25% and can be tuned to more if needed.
When both Big SQL and HDFS DataNodes are co-located there can be heavy disk drive IO usage resulting in lower performance. Depending on the performance of the disk drives, and the amount of data being read or written, this might be a concern.
2. Big SQL Worker Nodes on dedicate edge nodes
In this topology Big SQL Workers are installed on dedicated nodes. The client packages for dependent services also need to be installed. The DataNodes are accessible but there is no co-location.
- Reduce resource and IO contention
- Simplified resource management
Big SQL can use most of the memory and cpu resources when the nodes are not running other services. Separating the DataNodes from the Big SQL Worker will reduce IO contention. In testing we have seen slightly faster (< 10% improvement) queries with this topology and a minimum network speed of 10 GbE between nodes.
YARN integration is not used in an edge topology since resources do not need to be shared with other services. The resources can be managed within Big SQL service.
- Parallelism loss
- Increased Network traffic
- Dedicated hardware
With a fewer number of nodes there will be less parallelism. Adding Logical Worker nodes will help increase the parallelism, but may also require increasing the memory and cpu requirements for each Big SQL Worker node.
Separating the Big SQL Workers from the HDFS DataNodes means all the data is read from remote nodes. This will increase the amount of network traffic and may require higher performing network speeds to achieve equivalent performance.
More hardware may be needed in the cluster to have dedicated Big SQL Worker nodes.
Summary of Topologies
|Big SQL Worker and DataNodes co-located||Dedicated edge nodes for Big SQL workers|
|Memory, cpu, and IO||Shared resources with other services||Full utilization of all resources|
|YARN integration||Allow YARN to manage resources||Big SQL manages resources|
|Processing parallelism||More nodes equates to more possible parallelism||Fewer nodes but with more resources Logical Workers can be leveraged|
|Network traffic||Utilize co-located data with less network traffic||May requires a faster network between nodes to offset the increased traffic|
Note that there are no functional restrictions when running the Big SQL Service on dedicated edge nodes. All functional capabilities are available across both topology options.
Sizing of Big SQL Edge Nodes in the cluster
Capacity planning for Big SQL is by and large the same regardless of topology chosen, but there are special considerations for certain topologies.
In general, you need to give Big SQL more resources as you increase the expected load. You should increase the number of Big SQL workers and/or give Big SQL workers more resources (memory and CPU) as the system becomes busier. Some examples of how Big SQL can become busier:
- Rate of incoming queries increases, such as when new users or new workloads come online
- Underlying Big SQL tables increase in size
You should size your cluster such that it can handle peaks in activity. Keep in mind your needs for the immediate future and beyond.
As discussed earlier, when Big SQL runs on edge nodes separate from HDFS DataNodes, there will be more network traffic. Recommendations:
- Use at least a 10 GbE network and enable jumbo frames.
- Number of Big SQL edge nodes should be less than or equal to the number of HDFS DataNodes, to avoid HDFS becoming a bottleneck
Performance comparison of topologies
IBM tests Big SQL using different topologies such as those just described. One of goals of testing different topologies is to understand the implications and tradeoffs of the different topology choices.
We compared the performance when Big SQL workers and HDFS DataNodes are co-located, versus when Big SQL workers run on nodes separate from the HDFS DataNodes. In other words, we wanted to understand the impact of separating Big SQL compute from table data storage.
To test this, we installed Big SQL using two different topologies. The first topology was a cluster with a total of 6 nodes (one management node and 5 compute nodes). The following were install on each compute node: HDFS DataNode, Yarn NodeManager, and Big SQL worker.
First topology: 1 management node 5 compute nodes (DataNode + NodeManager + Big SQL worker + ...)
The second topology had a total of 31 nodes (one management node and 30 compute nodes). The compute nodes were further divided (no overlap) to have 5 Big SQL workers and 25 HDFS DataNodes. (Yarn NodeManagers were also running on the HDFS data nodes.)
Second topology: 1 management node 25 compute nodes (DataNode + NodeManager + ...) 5 Big SQL worker
Note that with both topologies, the amount of Big SQL processing power was the same. The number of Big SQL workers was the same (5 workers), and with both topologies Big SQL was configured that way.
As a workload, we chose Hadoop-DS, a workload based on the TPC-DS benchmark at scale factor 10TB. TPC-DS is a popular benchmark used with SQL engines including SQL-on-Hadoop engines. This workload consists of 99 SQL queries run sequentially in one or more query streams. We tested with one stream and 6 streams. A single stream run is sometimes called a â€śpowerâ€ť run. A run with more than one stream is called a â€śthroughputâ€ť run. In a throughput run, the 99 queries are run in a different order in each stream.
We configured Big SQL the same in both topologies, giving it 85% of memory and CPU. We deployed a single Big SQL worker on each physical machine.
Hardware use for both topologies: Lenovo x3650 M4 BD CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz (20 cores, 40 hyper-threads) Memory: 128 GB JBOD storage: 8x2TB Network: 10 GbE Software stack: Linux RHEL 7.3 HDP 2.6.3 Big SQL 5.0.2
Note: The performance results discussed are based on measurements by IBM in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors. No assurance can be given that others will achieve results similar to those described here.
With both topologies, throughput runs kept the cluster busier than power runs. A 6-stream throughput run does 6X more work than a single-stream run, but during the tests describe here, throughput runs only took about twice as long to complete.
As expected, there was more network traffic using the second topology, since Big SQL had to read all tables over the network from separate HDFS DataNodes. With the second topology, the Big SQL workers had a significant amount of network READS, while the HDFS DataNodes were doing network WRITES. However, the throughput runs did not saturate the network in a sustained fashion. On average, the throughput runs consumed about 40% of the available network bandwidth. Advantage (network): first topology (less network traffic is better).
As expected, there was heavier JBOD disk I/O traffic using the first topology than with the second topology. In the first topology, all HDFS reads occur on same 5 nodes where the Big SQL workers were running. In the second topology, the HDFS reads were spread across 25 separate HDFS data nodes. Advantage (storage I/O): second topology.
In our test environment, both â€śpowerâ€ť and â€śthroughputâ€ť runs completed in about 10% less time using the second topology. Conclusion: For this test environment, the reduction in JBOD storage I/O contention with the second topology generously offset the increase in network traffic.
While we saw a slight speed up in the total time to finish the Hadoop-DS workload with the second topology, some queries actually took longer. Results with other workloads could vary.
As shown in the S-charts, with the first topology as the baseline, most of the second topology queries were the same or faster for 1 stream and 6 streams (near zero or negative difference on the right side of the graphs). Only a few were slower (positive difference on the left side of the graphs).