Big SQL LOAD HADOOP uses MapReduce jobs. Like any other MapReduce job, configuring YARN and MapReduce memory properties is key to running LOAD optimally. MapReduce jobs tend run into OutOfMemory
java errors if YARN and MapReduce memory settings are too small; If the properties are too large, the number of concurrent map and reduce tasks will decrease, also negatively impacting performance and wasting memory.
Proper YARN and MapReduce memory sizes from lab tests
First, let’s look at which properties impact LOAD. We tested LOAD HADOOP in our labs using the following table definitions derived from TPCH. They are partitioned tables, and the scale factor is 1TB.
Partitioned table definition:
CREATE HADOOP TABLE LINEITEM ( L_ORDERKEY BIGINT NOT NULL, L_PARTKEY INTEGER NOT NULL, L_SUPPKEY INTEGER NOT NULL, L_LINENUMBER INTEGER NOT NULL, L_QUANTITY FLOAT NOT NULL, L_EXTENDEDPRICE FLOAT NOT NULL, L_DISCOUNT FLOAT NOT NULL, L_TAX FLOAT NOT NULL, L_RETURNFLAG VARCHAR(1) NOT NULL, L_LINESTATUS VARCHAR(1) NOT NULL, L_COMMITDATE VARCHAR(10) NOT NULL, L_RECEIPTDATE VARCHAR(10) NOT NULL, L_SHIPINSTRUCT VARCHAR(25) NOT NULL, L_SHIPMODE VARCHAR(10) NOT NULL, L_COMMENT VARCHAR(44) NOT NULL) PARTITIONED BY (L_SHIPDATE VARCHAR(10) ) STORED AS PARQUETFILE; CREATE HADOOP TABLE ORDERS ( O_ORDERKEY BIGINT NOT NULL, O_CUSTKEY INTEGER NOT NULL, O_ORDERSTATUS VARCHAR(1) NOT NULL, O_TOTALPRICE FLOAT NOT NULL, O_ORDERPRIORITY VARCHAR(15) NOT NULL, O_CLERK VARCHAR(15) NOT NULL, O_SHIPPRIORITY INTEGER NOT NULL, O_COMMENT VARCHAR(79) NOT NULL) PARTITIONED BY (O_ORDERDATE VARCHAR(10) ) STORED AS PARQUETFILE;
When Big SQL starts, it will occupy some memory exclusively. This memory is not under YARN control. Its size is defined by bigsql_resource_percent
(default is 25% of physical memory); this means YARN and MapReduce can use the remaining 75% of total memory on each node by default. Here is summary of best values to use:
Table 1: Recommended YARN and MapReduce memory configuration
TPCH 1TB partitioned table | Default Value | Recommended Value | Explanation |
1. Big SQL | |||
bigsql_resource_percent | 25% | 25% to 80% | Amount of cluster resources to dedicate to Big SQL |
2. Map Memory | |||
mapreduce.map.memory.mb | 1024 | 3400 | Memory (in MB) allocated for each map task |
mapreduce.map.java.opts | -Xmx756m | -Xmx2720m | Set max Java heap size of map process with -Xmx option. Suggest 80% of mapreduce.map.memory.mb |
3. Reduce Memory | |||
mapreduce.reduce.memory.mb | 1024 | 9800 | Memory (in MB) allocated for each reduce task |
mapreduce.reduce.java.opts | -Xmx756m | -Xmx7840m | Set max Java heap size of reduce process with -Xmx option. Suggest 80% of mapreduce.reduce.memory.mb |
4. Application Master Memory | |||
yarn.app.mapreduce.am.resource.mb | 512 | 2000 | Memory (in MB) allocated for Application Master |
yarn.app.mapreduce.am.command-opts | -Xmx312m | -Xmx1600m | Set max Java heap size of AM process with -Xmx option. Suggest 80% of yarn.app.mapreduce.am.resource.mb |
5. YARN Node Manager Memory | |||
yarn.nodemanager.resource.memory-mb | 163840 | 163840 [Calculate according to physical memory] | Total memory (in MB) that YARN can be allocated for containers on one node. Here is 75% of total memory on one node. Remaining 25%( bigsql_resource_percent) is allocated to Big SQL |
yarn.scheduler.minimum-allocation-mb | 512 | 512 | The smallest amount of memory (MB) that can be requested for a container. |
yarn.scheduler.maximum-allocation-mb | 163840 | 163840 [Calculate according to physical memory] | The largest amount of memory (MB) that can be requested for a container. Suggest the same as yarn.nodemanager.resource.memory-mb |
LOAD HADOOP table Result | OOM error | Successful |
Methods used to determine the best values for your workload
The properties list in the above table can be divided into 2 types:
Type 1. Calculated. They can be set once. E.g.
- yarn.nodemanager.resource.memory-mb=163840 . It is Total physical memory size (in this case 216GB)x( 1 – 25% ).
- yarn.scheduler.maximum-allocation-mb = yarn.nodemanager.resource.memory-mb
- yarn.scheduler.minimum-allocation-mb =512 (fixed value)
For properties of type 1, it is straightforward to set them according to node physical memory. Note: If Big SQL memory percentage is changed (to be higher or lower than 25%), then we need to re-calculate yarn.nodemanager.resource.memory-mb and yarn.scheduler.maximum-allocation-mb accordingly
Type 2. Tuned based on workload and data size. They are:
- mapreduce.map.memory.mb, mapreduce.reduce.memory.mb and yarn.app.mapreduce.am.resource.mb
- mapreduce.map.java.opts = 80% x mapreduce.map.memory.mb
- mapreduce.reduce.java.opts = 80% x mapreduce.reduce.java.opts
- yarn.app.mapreduce.am.command-opts = 80% x yarn.app.mapreduce.am.resource.mb
Properties of type 2 need several iterations to lock down the most efficient values. Here is the process I used to tune these properties for workload TPCH 1TB.
1. Run LOAD HADOOP with default configuration (see table 1). Encountered OOM (out of memory) errors for map and reduce tasks. You will see an error message like the one below in the MapReduce task logs, which means the configured map memory size (mapreduce.map.memory.mb=1024) is too small.
Error Message: Container [pid=26783,containerID=container_1389136889967_0009 _01_000002] is running beyond physical memory limits. Current usage: 1.2 GB of 1 GB physical memory used; 1.2 GB of 5 GB virtual memory used. Killing container.
2. Increase all type 2 memory related properties to resolve OOM error and make LOAD HADOOP run successfully. I recommended increasing the value with double size each time until the OOM error no longer exists.
3. Although the current configuration supports LOAD HADOOP table of TPCH 1TB, do we over-configure them? Over-configuring hurts performance as it decreases the number of containers that can be run concurrently. To answer this question, we need check how much memory each kind (Map, Reduce and AM) of container actually allocates. We can do this by checking YARN Node Manager log. Node Manager monitors all container resource allocations and writes info into its log. You can find the log files under /var/log/hadoop-yarn/yarn/yarn-yarn-nodemanager-<hostname>.log on data nodes. The log entry format looks like:
Log Entry: 2015-12-23 07:40:14,262 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(458)) - Memory usage of ProcessTree 471707 for container-id container_1450840321024_0002_01_003626: 7.0 GB of 11 GB physical memory used; 11.2 GB of 55 GB virtual memory used
This log entry is for one reduce container and shows this container allocated 7GB from a maximum of 11GB. We can grep and sort all reduce container memory allocation information from the Node Manager log to check what is the max size needed by reduce whilst the job is running. Here is a command to perform this check:
grep "of 11 GB" yarn-yarn-nodemanager-.log | awk '{print $15 $16 }'|grep GB | sort
Outputs are like:
… … 8.7GB 8.7GB 8.7GB 8.7GB 8.7GB 8.8GB 8.9GB 9.0GB
The output above shows that the max size is 9GB on this particular data node. The same check must be performed on all data nodes, after which we see that the max size is 9.4GB across all data nodes. So setting mapreduce.reduce.memory.mb=9800 (slightly larger than 9.4GB) should be sufficient. Do the same thing to check the max memory allocation for Map and AM memory and tune the related configuration values. Finally, re-run LOAD HADOOP commands to verify that the new values are okay.
Note:
1. LOAD table memory requirements are different for partitioned tables and non-partitioned tables. In general, partitioned table need more memory for Reduce and Application Master containers.
2. A new version of ANALYZE (V2.0) is already provided since Big SQL V4.2. It is a totally new framework to run Analyze, which does not rely on MapReduce – and will require no MapReduce tuning.
Summary
Proper YARN and MapReduce memory settings are very important for Big SQL LOAD HADOOP performance. This article has provided a technique to tune these properties, along with a set of recommendations based on LOAD HADOOP of 1TB of TPCH data on the specified data nodes.