IBM Developer Blog

Follow the latest happenings with IBM Developer and stay in the know.

Reduced boot time and DLPAR time with LMT improvements

Some of the key factors that are important for system administrators during system maintenance are how long it takes to apply system patches or updates that require a reboot and how fast the system resources can be reconfigured without disrupting the existing workloads.

Boot time is an important component of system performance as users must wait for the boot operation to complete before they can use the device. It is the time taken for a device to be ready to operate after the power has been turned on. Slow boot times would make the system owners to refuse to apply any patches or updates that require a reboot.

Dynamic logical partitioning (DLPAR) is the capability of a logical partition (LPAR) to be reconfigured dynamically, without having to shut down the operating system that runs in the LPAR. DLPAR enables memory, CPU capacity, and I/O interfaces to be moved non-disruptively between LPARs within the same server. This support exists on IBM AIX since AIX 5L. System owners expect DLPAR operations to have minimal impact on the currently running workloads.

This blog talks about the AIX 7.3 system boot and DLPAR optimizations.

AIX 7.3 comes with an optimized boot phase which will have much shorter boot time when compared to a similar configuration with earlier AIX releases. AIX 7.3 has also significantly optimized the CPU and memory dynamic LPAR operations. Both were achieved by the redesign of the Lightweight Memory Trace (LMT) infrastructure.

LMT is a critical reliability, availability, and serviceability (RAS) function on AIX, which is ON by default. To enhance the boot phase, the LMT buffer allocation which occurs early in the boot phase was redesigned and optimized. In AIX 7.3, during boot, LMT will allocate only sufficient buffer size that is sufficient to capture traces during the boot. After the boot, the LMT buffers are resized in the background without holding the boot process, there by resulting in significant improvements in boot times.

Memory Size
(in TB)
Boot time till login prompt
(in sec)
Boot time reduction
Power9 (with AIX 7.2 TL5) Power10 (with AIX 7.2 TL5) Power10 (with AIX 7.3) AIX optimization effect AIX + Power10 effect
1 137.81 107.12 72.37 32.44% 47.5%
2 215.27 193.08 97.28 49.62% 54.8%
3 286.77 252.59 126.41 49.95% 55.9%
4 340.43 299.35 159.72 46.64% 53.1%

The above table captures the reduction in AIX boot time (in percentage) on a large memory system with 48 cores in simultaneous multithreading (SMT) mode 8. AIX 7.3 is supported on IBM Power8 and later processors. The latest Power processor at the time of writing this blog is IBM Power10 and so the data has been captured in comparison with it. On an average, we noticed more than 50% reduction in AIX boot time on IBM Power10 compared to IBM Power9.

LMT buffer management was also optimized for the DLPAR operations. The LMT buffers that are allocated per CPU may sometimes need to be resized during CPU or memory DLPAR operations to keep the total LMT buffer size under predefined system limits. The resize operations were optimized, and this resulted in significant reduction in the time spent on DLPAR operations.

CPU DLPAR completion time Power9 versus Power10
Memory size DLPAR operations Operation completion in sec Performance improvement on Power10
Power9 (7.2 TL5) Power10 (7.3)
512 GB ADD 24 Core 191 17 91%
REM 24 Core 33 14 57%
1 TB ADD 24 Core 360 25 93%
REM 24 Core 70 21 70%
1.5 TB ADD 24 Core 420 35 91%
REM 24 Core 81 24 70%
1.5 TB ADD 24 Core 262 35 86%
REM 24 Core 44 19 56%
2.5 TB ADD 24 Core 53 42 20%
REM 24 Core 30 16 46%

This table shows the time spent on the DLPAR process for adding and removing 24 cores with different memory sizes. The LPAR originally had 48 cores running in the default SMT 8 mode. The REM operation removes 24 cores and the ADD operation adds back those removed cores.

As can be seen in the above table, there is a significant improvement in both ADD and REM paths. The scaling issue exists only till 2 TB memory on this setup, which was significantly reduced under the new design improvements.

These optimizations are part of continuous and committed efforts from IBM AIX to better serve its customers. Reducing the time spent on boot and reconfiguration can provide a better administrative experience and is usually welcomed by the AIX system administrators.