After spending 4 years in the making, IBM has released IBM i 7.2 and POWER8 at last. How will this article be different and why would you want to read it? Well, maybe for the same reasons that I decided to write it. Let me explain.
I’ve been with IBM for almost 20 years, most of it behind the scenes as a developer of IBM i LIC, the foundational code of IBM i that implements the technology-independent machine interface (TIMI) for the OS and applications. Yet, this is my first foray into writing for IBM Developer. “So why now?” I’m glad you asked. IBM i 7.2 and POWER8 are the result of the inspirations and innovations of thousands of talented and dedicated IBMers around the globe. With the general availability of IBM i 7.2, the excitement and enthusiasm around the IBM development lab are palpable. Like many of my colleagues, I derive satisfaction in knowing that these new Power Systems servers with POWER8 technology and IBM i 7.2 can quietly serve millions of people throughout the world, once again raising the bar for performance, reliability, and value. Sometimes, silence is golden, but right now I would like to tell you about some capabilities and features that have been part of my world for the last 4 years of development. I invite you to join me: Welcome to the world of IBM i 7.2 and POWER8!
Operating system requirements
Both IBM i 7.2 and IBM i 7.1 Technology Refresh 8 are supported on the new Power Systems servers with POWER8 technology. The operating system and most applications for IBM i are built on a TIMI that isolates programs from differences in processor architectures, and allows the system to automatically capitalize on many new IBM Power Architecture® features without changes to existing applications. The new IBM i 7.2 release continues the tradition, providing a high degree of integration, security, and ease-of-use across multiple generations of IBM Power Systems servers and processors, including the new POWER8 processor.
Multi-core and multi-thread support
Similar to previous generations of Power Systems servers, POWER8-processor based systems are designed to scale-up to support workload growth requirements, and also to serve as workload consolidation platforms. This is made possible through logical partitioning and virtualization with hardware, hypervisor, and operating systems that are optimized for these dual roles. As this article unfolds, terms such virtual processor and processor compatibility mode will be used. So, if you are unfamiliar with the IBM PowerVM® processor virtualization concepts, you may want to take a detour and review Processor Virtualization 101.
One dimension of system scalability is transaction processing capacity. In recent years, gains in transaction processing capacity have, in large measure, come from growth in multithreading and multiprocessing as opposed to single-thread performance. POWER8 breaks new ground by providing significant increases in single-thread, core, and system performance. Servers that are based on POWER8 processors offer up to 50% more commercial processing workload (CPW) rating per core than similarly configured IBM POWER7® processor-based servers (CPW compared for IBM Power 740 server, 16 core POWER7 running at 4.2 GHz and IBM Power® System S824, 16 core POWER8 running at 4.15 GHz. The CPW rating provides a measure of online transaction processing (OLTP) workload performance for systems that run IBM i.
Operating system limits
The default supported processor threading contexts and logical partition maximum processor limits by processor compatibility mode and IBM i release are shown in Table 1. The published limits are the defaults; depending on the IBM i release and configured processor compatibility mode, support for additional processors may be available by contacting IBM Lab Services.
Table 1. IBM i maximum processor limits
|Processor compatibility mode||Supported threading contexts||Maximum processors|
|IBM i 7.1 TR8||IBM i 7.2|
|POWER7||ST, SMT2, SMT4||32||32|
|POWER8||ST, SMT2, SMT4, SMT8||32||48|
Scalable simultaneous multithreading (SMT) with intelligent threads
POWER8, similar to POWER7 before it, uses Intelligent Threads technology to maximize workload performance regardless of the processor’s threading context. For POWER8, the technology has been enhanced to adapt more quickly and with greater efficiency to changes in the workload. If the POWER8 processor is under-committed, meaning fewer hardware threads are dispatched than are available, the core performance is roughly the same, independent of threading context. So, for example, if one thread is dispatched, performance will be similar in single thread (ST), SMT2, SMT4, and SMT8 contexts; if two threads are dispatched, performance will be similar in SMT2, SMT4, and SMT8 contexts; if three or four threads are dispatched, performance will be similar in SMT4 and SMT8 contexts. This is illustrated in Figure 1.
Figure 1. POWER8 SMT scaling
From a usability standpoint, intelligent threads means that manual system-level processor threading context adjustments typically aren’t necessary in order to maximize workload performance. Highly multithreaded workloads can benefit from the additional throughput offered by SMT8 technology, but moderately threaded, and even single-threaded workloads are still able to achieve maximum performance automatically. It just works!
Flexible SMT controls
The processor threading context determines the number of usable threads per processor, and impacts the processor utilization and accounting information reported by IBM i work management and performance management tools. While the default processor threading context is suitable for most commercial environments, IBM i offers manual controls that allow the system to be fine-tuned to the specific characteristics of the workload. The processor multitasking mode and processor maximum SMT level can be used to establish any processor threading context supported for the initial program load (IPL). In general, as the processor threading context is reduced, single-thread performance and determinism increase, but it is at the expense of greater aggregate throughput potentially attainable in the higher threading context. Flexible SMT technology allows the system to be tailored to the specific needs of the business.
For POWER8, IBM i supports flexible SMT with fully dynamic system-level processor threading controls. IBM i 7.2 and 7.1 TR8 offer on-the-fly switching among single-thread and simultaneous multithreading contexts supported by the POWER8 processor. The processor threading contexts available for the partition IPL is determined by the processor compatibility mode (PCM) partition attribute as shown in Table 2. Note that the PCM is established during partition activation.
Table 2. IBM i supported and default processor threading contexts
|Processor compatibility mode||Supported threading contexts||Default thread context|
|IBM i 7.1 TR8||IBM i 7.2|
|POWER7||ST, SMT2, SMT4||SMT4||SMT4|
|POWER8||ST, SMT2, SMT4, SMT8||SMT4||SMT8|
The default thread context is selected by the operating system, but it can be easily changed by the system administrator. Given the intuitive and high performance delivered by intelligent threads, IBM i has historically used the maximum supported thread context as the default for a release optimized for a new generation of system. That said, IBM i 7.1 TR8 continues to use SMT4 for the default thread context for POWER7 and POWER8 processor compatibility modes. For IBM i 7.2, the default thread context is SMT8.
The choice of SMT4 for POWER8 in IBM i 7.1 TR8 was made for the benefit of clients migrating from an earlier generation Power Systems server. Many users upgrading to POWER8 will be moving their workloads from a POWER7 server using SMT4, while continuing to use IBM i 7.1 for a period of time. The choice of SMT4 as the default thread context for POWER8 in IBM i 7.1 TR8 offers most of the benefits and performance advantages of POWER8, with the familiarity and continuity of SMT4.
Processor multitasking mode
Switching to and from the single-thread context can be accomplished using the IBM i processor multitasking mode system value, QPRCMLTTSK. For POWER8, QPRCMLTTSK changes are effective immediately and persist across partition IPL.
Supported QPRCMLTTSK values are as follows:
- 0 – Processor multitasking is disabled. This value corresponds to the single-thread context.
- 1 – Processor multitasking is enabled. This value corresponds to SMT2 context if the partition’s processor compatibility mode is IBM POWER6® or IBM POWER6+™. Otherwise, the thread context is determined by the maximum SMT level control.
- 2 – Processor multitasking is system controlled. This is the default value, and the setting recommended by IBM. For POWER8, the implementation is identical to ‘1’, with processor multitasking enabled.
DSPSYSVAL SYSVAL(QPRCMLTTSK) CHGSYSVAL SYSVAL(QPRCMLTTSK) VALUE('0') /∗ ST context ∗/ CHGSYSVAL SYSVAL(QPRCMLTTSK) VALUE('1') /∗ SMTn context ∗/ CHGSYSVAL SYSVAL(QPRCMLTTSK) VALUE('2') /∗ SMTn context ∗/
Processor maximum SMT level
When processor multitasking is enabled, switching among the thread contexts available for the partition IPL can be accomplished using the change processor multitasking information API, QWCCHGPR. The QWCCHGPR API changes are effective immediately and persist across a partition IPL.
The QWCCHGPR API takes a single parameter, the maximum number of secondary threads per processor:
- 0 – No maximum is selected. The system uses the default number of secondary threads as determined by the operating system.
- 1-255 – The system might use up to the number of secondary threads specified.
The QWCCHGPR API might be called from a command line.
Note that setting the maximum number of secondary threads does not establish the processor threading context directly. The maximum value will be accepted regardless of the processor threading contexts supported by the underlying hardware, and the operating system will apply the configured maximum to the system. On a POWER8 processor-based system, if a maximum value is specified by the QWCCHGPR API, the operating system tries to establish the maximum thread context supported (as shown in Table 2), subject to the maximum specified by the QWCCHGPR API. In other words, if the QWCCHGPR API sets the maximum number of secondary threads to a value that is not supported by the hardware, the operating system sets the thread context to the maximum supported by the hardware that meets the specified value.
CALL PGM(QWCCHGPR) PARM(X'00000000') /∗ No maximum ∗/ CALL PGM(QWCCHGPR) PARM(X'00000001') /∗ SMT2 context ∗/ CALL PGM(QWCCHGPR) PARM(X'00000002') /∗ SMT2 context ∗/ CALL PGM(QWCCHGPR) PARM(X'00000003') /∗ SMT4 context ∗/ CALL PGM(QWCCHGPR) PARM(X'00000004') /∗ SMT4 context ∗/ CALL PGM(QWCCHGPR) PARM(X'00000007') /∗ SMT8 context ∗/ CALL PGM(QWCCHGPR) PARM(X'000000FF') /∗ SMT8 context ∗/
The maximum number of secondary threads can be obtained from the Retrieve Processor Multitasking Information (QWCRTVPR) API. Note that the value returned is the maximum number of secondary threads configured.
Additional POWER8 highlights
IBM i uses the TIMI to isolate programs from differences in processor architectures, and allows the system to automatically capitalize on many new Power Architecture features without changes to existing applications. In some cases, the IBM i operating system enables new features based on the processor compatibility mode of the partition. We’ll take a look at several POWER8 examples in the following sections.
Live Partition Mobility
Live Partition Mobility (LPM) is a PowerVM feature that provides the ability to migrate an active or inactive IBM i partition between Power Systems servers. IBM i support for LPM was introduced for POWER7 processor-based servers in IBM i 7.1 TR4.
LPM is supported in IBM i 7.2 and 7.1 TR8 between POWER8 processor-based servers, and also between POWER8 and POWER7 processor-based servers, but with a caveat. For migration between POWER7 and POWER8 processor-based servers, the partition must be configured for a processor compatibility mode that is supported by both servers, and therefore, POWER7, POWER6, or POWER6+ mode. Note that while PowerVM supports LPM for POWER7 and POWER8 processor-based servers in POWER6 and POWER6+ processor compatibility modes, IBM i does not support LPM for POWER6 processor-based servers.
Virtual time base and instruction count
POWER8 provides new hardware facilities for thread-level instruction count and virtual processor timekeeping. Because these facilities are not available on POWER7, and the partition could find itself running on a POWER7 processor-based server using Live Partition Mobility, the operating system provides new Instruction Count (IC) and Virtual Time Base (VTB) data only for partitions running in the POWER8 processor compatibility mode.
The IC and VTB facilities are relatively straightforward. The IC is the count of POWER instructions run by a hardware thread, and the VTB is the elapsed time of virtual processor (core) dispatch for a hardware thread. Both are privileged, that is, they are not directly accessible to application programs, but accumulated forms of them are provided by the operating system. The accumulation of IC and VTB for a software thread follows directly from the POWER8 thread IC and VTB registers. The accumulation of processor non-idle IC and VTB is somewhat less direct, occurring if and only if any of the processor’s threads is not idle, that is, running a program. IBM i 7.2 accumulates IC and VTB for each software thread and process, and non-idle IC and non-idle VTB for each processor, and partition-wide. Processor IC accumulation is also performed according to some other categories, such as interrupt IC.
For the programmer, the IC and VTB accumulations are available from a variety of IBM i 7.2 machine interface instructions including:
- MATRMD Hex 26, 28 – Materialize resource management data
- MATPRATR Hex 21, 23 – Materialize process attributes
- MATMATR Hex 220 – Processor attributes
IBM i 7.2 performance management tools have been updated to incorporate IC and VTB accumulations. For example, collection services also include the process / thread IC and VTB data in the QAPMJOBMI file, and the sum-of-processor IC and VTB data in the QAPMSYSTEM file, making longer-term historical analysis possible without adding significantly to data collection costs. For more detailed analysis, Performance Explorer (PEX) includes thread IC and VTB in the base event data, which can be traced down to a very short timescale.
On POWER8, IC and VTB accumulations provide valuable diagnostic insights into the individual process/thread and overall system performance. They can be used alone or in combinations. For example, the non-idle processor VTB and IC were designed to provide a 24×7 proxy of processor cycle and instruction metrics that are frequently used for monitoring overall system health. On earlier generations of IBM Power servers, this data was available only when a PEX data collection was active.
Vector Scalar eXtension and crypto acceleration
POWER8 features enhanced Vector Scalar eXtension (VSX) capabilities, including new instructions to accelerate some frequently used cryptographic operations. VSX in Power Systems provides support for vector and scalar binary floating point operations conforming to the Institute of Electrical and Electronics Engineers Standard for Floating Point Arithmetic (IEEE-754). VSX can be used to increase parallelism by providing single-instruction, multiple-data (SIMD) execution functionality for floating point double-precision operations, greatly improving the performance of some applications. IBM i Portable Application Solutions Environment (PASE) applications running on IBM i 7.2 with POWER8 processors can now take advantage of VSX. For more information about VSX usage by IBM i PASE, refer to the IBM Redbooks® Tuning Techniques for IBM Processors, including IBM POWER8.
IBM i 7.2 leverages the enhanced POWER8 vector processing capabilities to accelerate AES cryptographic operations when operating in the POWER8 processor compatibility mode. Cryptographic services APIs, SSL, VPN, Backup Recovery and Media Services (BRMS) tape encryption, and SQL encryption functions automatically use POWER8 enhanced vector processing capabilities to deliver significant increases in performance. Figure 2 exemplifies the gains resulting from POWER8 cryptographic acceleration. The charts reveal relative encrypt and decrypt throughput for Cipher Block Chaining (CBC) and Electronic Code Book (ECB) modes using an internal version of IBM CryptoLite for C/C++ (CLiC) toolkit primitives. In each chart, the series labeled “Vector” use the POWER8 vector accelerated implementation, whereas, the others do not. As shown, performance gains are dependent on the modes and block sizes, and results do vary, but POWER8 vector acceleration can deliver breakthrough levels of cryptographic performance for some applications.
Figure 2. POWER8 AES crypto acceleration
In this article, we’ve taken a closer look at some of the capabilities and features that have been part of my world for the last 4 years.
- POWER8 servers are up to 50% faster than comparable POWER7 models for commercial workloads.
- Cryptographic functions on IBM i 7.2 on POWER8 are performed up to 15 times faster than ever before.
- IBM i 7.2 on POWER8 offers enhanced 24×7 workrate metrics for system health monitoring and performance analysis.
- IBM i 7.2 is highly scalable and configurable, with a flexible range of SMT options, but is designed to deliver superior system and single-thread performance without the need for customized tuning.
Welcome to the world of POWER8 and IBM i 7.2, the most powerful, most flexible, and most scalable generation of IBM Power Systems servers ever.