Every application needs a fast, scalable, and memory efficient allocator. However, the memory request patterns of each application might be different.
It is difficult to provide a single common allocator or tunable that can meet the requirements of all the applications.
IBM® AIX® provides different memory allocators and each allocator uses different memory management algorithms and data structures to meet the requirements of various applications
Scope of this article
This article details the major advantages of the AIX scalable memory allocator along with few use cases.
Default AIX memory allocator and drawbacks
The default memory allocator, which is also called Yorktown, is most common MALLOC found in AIX systems which is active after you install an AIX system.
The default malloc policy has the following major drawbacks:
- Does not self-tune to the workload variation and requires user intervention to obtain best performance.
- Requires clients to possess advanced skills for hand tuning their workloads.
- Does not scale well under heavy malloc contention.
- Causes memory space inefficiency when using scalability options, despite better performance in multithreaded environment.
- Requires multiple hand-tune options to be set manually based on single or multi-threaded applications to get optimal performance.
In general, with the default memory allocator, multiple threads making very frequent calls to malloc and free, show up as a lot of time spent in multithreaded applications. As a result, you may see malloc scalability issues on AIX across different workloads and application domains such as, IBM Tivoli® Directory Server, IBM WebSphere® Application Server, IBM InfoSphere Warehouse, IBM Content Collector indexer, and so on.
New AIX memory allocator policy
As a long-term solution, AIX came up with more robust, optimal allocation policy called Watson2 which is also called scalable malloc as scaling is one of the major features of Watson2.
Watson2 or the scalable malloc allocator is an off-the-shelf allocator that yields better processing and memory allocation and has been part of the AIX 7.1 version since 2013.
The Watson2 allocator can be set by exporting the
MALLOCTYPE environment variable:
Watson2 allocator has a lot of advantages over the default allocator. Refer to the following list for some of the major advantages:
- Watson2 is an off the shelf memory allocator. It does not need any user intervention or tuning to obtain optimal performance. That is, the Watson2 malloc subsystem adapts to the behavior of the application as it changes from a single thread to multiple threads and back from multiple threads to a single thread.
- It scales better compared to a default allocator in multi-threaded applications
- It includes the Reliability, Availability and Serviceability (RAS) features to understand the AIX and third-party component’s hierarchy structure to enable error logging, component tracing, and dumping facilities.
Case studies with SPEC CPU2006 benchmarks
Standard Performance Evaluation Corporation (SPEC) CPU benchmarks are widely used in both industry and academia. This is a benchmark in which processor and memory performance are measured.
CPU2006 consists of the following two benchmark groups:
- CINT2006 for measurement of system performance in case of integer operations.
- CFP2006 for measurement of system performance in case of floating-point operations.
We considered the following CINT2006 and CFP2006 applications written in C, C++, and Fortran languages for our scalable malloc performance evaluation with respect to the default allocator.
- 471.omnetpp is a discrete event simulator to model a large Ethernet campus network for 471.
- 482.sphinx3 is a widely known speech recognition system from the Carnegie Mellon University.
- 483.xalancbmk is for XML processing It is a modified version of Xalan-C++, which transforms XML documents to other document types.
- 434.zeusmp is a computational fluid dynamics program developed at the National Center for Supercomputing Applications (NCSA), University of Illinois at Urbana-Champaign, for the simulation of astrophysical phenomena.
- 450.soplex is a linear programming solution that uses a simplex algorithm and sparse linear algebra.
Time and heap consumption are the two major metrics considered while evaluating scalable malloc/Watson2.
Figure 1 explains the time and heap consumption of the Watson2 memory allocator compared to a default memory allocator.
Figure 1. Advantages of Watson2 memory allocator over a default memory allocator
This article helps a user to understand the benefits of AIX scalable malloc/Watson2 compared to the default memory allocator and highlights a major limitation of the default allocator (the need for manual intervention to set hand-tune options) for single and multithreaded applications to achieve optimal performance. The article also outlines the case studies with an industry standard benchmark (SPEC CPU2006) that covers most of the testing types such as single threaded, compute intensive, malloc intensive, and so on. For more information refer to the, System memory allocation article in the IBM Knowledge Center.