The IBM® XL compiler family consists of advanced and high-performance compilers that can be used to develop complex and computationally intensive programs. It encompasses three core languages: C, C++, and Fortran. A collection of the compilers targets at Linux on Power® platforms. These compilers assist you to develop high-performance applications efficiently in the process of compilation, optimization, linking, and debugging.

Announcement: XL C/C++ for Linux, V16.1.1 and XL Fortran for Linux, V16.1.1 are released in December 2018

Compilation for specific hardware

By default, the compiler generates code that runs on all supported systems, though this code does not run optimally on all supported systems. You can instruct the compiler to generate code that executes optimally on a given processor or architecture family.

Selecting the appropriate architecture for compilation

You can use -qarch to specify the processor architecture for which the code (instructions) should be generated. Using the -qarch option to target a specific architecture for the compilation results in code that provides the best performance for the selected architecture; however, it might not run on other architectures. If you want to generate code that can run on more than one architecture, choose a suboption that supports a group of architectures.

Tuning for your target architecture

Use -qtune to further tune instruction selection, scheduling, and other architecture-dependent performance enhancements to run best on a specific hardware architecture. For POWER7® or higher processors, you can also specify a target SMT mode to direct optimizations for best performance in that mode. If you specify a particular architecture with -qarch, -qtune automatically selects the suboption that generates instruction sequences with the best performance for that architecture. When -qtune is used with options that enable optimization, the compiler schedules the generated machine instructions to take maximum advantage of hardware features, such as cache size and pipelining, to improve performance.

Starting from IBM XL C/C++ for Linux, V13.1.1, the compiler leverages the Clang infrastructure from the open source community for a portion of its compiler front end. You can use -mcpu and -mtune, which are equivalent to -qarch and -qtune respectively, for GCC compatibility.
Here are the quick reference links for IBM XL C/C++ for Linux, V16.1.1, for little endian distributions, which supports POWER9™ technology.

Offloading computations to the NVIDIA GPUs

The combination of the IBM POWER® processors and the NVIDIA GPUs provides a platform for heterogeneous high-performance computing that can run several technical computing workloads efficiently. The computational capability is built on top of massively parallel and multithreaded cores within the NVIDIA GPUs and the IBM POWER processors. You can offload parallel operations within applications, such as data analysis or high-performance computing workloads, to GPUs.

Programming with OpenMP 4.5 device constructs

With XL C/C++ for Linux, V16.1.1 for little endian distributions and XL Fortran for Linux, V16.1.1 for little endian distributions, you can offload compute-intensive parts of an application and associated data to the NVIDIA GPUs by using the OpenMP 4.5 device constructs.
You must specify the -qoffload and -qsmp options to enable the support for offloading OpenMP target regions to NVIDIA GPUs.
You can use the XLSMPOPTS=target={mandatory | default | disabled} environment variable to control which device to execute target regions on. You can also use the supported runtime functions o manage device memory (C/C++ only) or to query the target environment.

Using XL C/C++ for Linux with NVCC

The NVIDIA CUDA C++ compiler (NVCC) from the NVIDIA CUDA Toolkit partitions C/C++ source code into host and device portions. You can use XL C/C++ for Linux, V16.1.1 for little endian distributions as the host compiler for the POWER processor with NVCC 9.2.

Programming with supported CUDA Fortran features

Starting from V15.1.4, XL Fortran for Linux supports the CUDA Fortran programming model to exploit the NVIDIA GPUs. You can use the commonly used subset of CUDA Fortran that is provided by XL Fortran to offload computations to the NVIDIA GPUs. You must specify the -qcuda option to enable the compiler support for CUDA Fortran.

For more information about offloading computations to the NVIDIA GPUs, see the XL C/C++ Optimization and Programming Guide and XL Fortran Optimization and Programming Guide.

Optimization capabilities

You can use several XL compiler options to control the optimization and performance of your programs. Optimizing transformations can give your application better overall execution performance. XL compilers provide a portfolio of optimizing transformations that are tailored to Power platforms. These transformations offer the following benefits:

  • Reducing the number of instructions that are executed for critical operations
  • Restructuring generated object code to make optimal use of the Power Architecture®
  • Improving the usage of the memory subsystem
  • Exploiting the ability of the architecture to handle large amounts of shared memory parallelization

Here is an overview of some options that can be used to control the optimization and tuning process (XL C/C++, V16.1.1 and XL Fortran, V16.1.1):

Table 1. Optimization options
Category Option Description
Optimization levels -O0 Performs only quick local optimizations.
-O2 Optimizes for the best combination of compile speed and runtime performance.
XL C/C++: -O3 Focuses on runtime performance at the expense of compilation time. It performs loop transformations and data flow analysis.
XL C/C++: -Ofast or the combination of -O3 and -qhot

XL Fortran: -Ofast or -O3
Performs aggressive loop transformations and data flow analysis at the expense of compilation time.
-O4 Performs whole program optimization, aggressive data flow analysis and loop transformations.
-O5 Performs more aggressive whole program optimization, more precise data flow analysis and loop transformations.
High-order transformations -qhot Performs high-order loop analysis and transformations (HOT) during optimization.
Interprocedural analysis -qipa Enables or customizes a class of optimizations that are known as interprocedural analysis (IPA), including Cross-File Inlining, Outlining, Cloning, Interprocedural Value Propagation, Variable Aliasing Refinement, and so on.
Aliasing -qalias Indicates whether a program contains certain categories of aliasing or does not conform to C/C++ standard aliasing rules. The compiler limits the scope of some optimizations when different names might aliases for the same storage location.
Profile-directed feedback optimizations -qpdf1, -qpdf2 Tunes optimizations through profile-directed feedback (PDF), where results from sample program execution are used to improve optimization near conditional branches and in frequently executed code sections.
Feedback directed program restructuring -qfpdr Provides object files with information that the IBM Feedback Directed Program Restructuring (FDPR®) performance-tuning utility needs to optimize the resulting executable file.
Floating point options -qstrict or -qnostrict Specifying -qstrict prevents optimizations that might change the semantics of the program and produce different output than when not optimized. Specifying -qnostrict allows those optimizations. -qstrict has suboptions for more detailed control.
-qfloat Selects different strategies for speeding up or improving the accuracy of floating-point calculations.
Vector instructions -qsimd Controls whether the compiler can automatically take advantage of vector instructions for processors that support them.
Parallelization -qsmp Enables parallelization of program code.

You can also specify the following options to generate listing files to better understand the optimization process.

Table 2. Listing options
Option Description
-qlist Produces a compiler listing file that includes an object listing.
-qlistfmt Creates an XML or HTML report to assist with finding optimization opportunities.
-qreport Produces listing files that show how sections of code have been optimized.

Quick reference links for IBM XL C/C++ for Linux, V16.1.1 for little endian distributions:

Linking choices

You can use the interprocedural analysis (IPA) during the compilation step, the link step, or both. If you apply IPA during compilation by specifying the -qipa, -O4 or -O5 option, specifying one of them in the link step makes your application benefit from link step optimization.
Quick reference links for IBM XL C/C++ for Linux, V16.1.1 for little endian distributions:

Debugging assistance

  • The -g option generates debugging information for use by a symbolic debugger, and it makes the program state available to the debugging session at selected source locations. You can use different -g levels to balance between debug capability and compiler optimization. When the -O2 optimization level is in effect, the debug capability is completely supported.
  • The -pg option prepares the object files produced by the compiler for profiling. When you compile with -pg, the compiler produces monitoring code that counts the number of times each routine is called. When you execute the compiled program and it ends normally, the compiler writes the recorded information to a gmon.out file. You can then use the gprof command to generate a runtime profile.
  • The -qcheck option generates code that performs certain types of runtime checking. If a violation is encountered, a runtime error is raised by sending a SIGTRAP signal to the process. Note that the runtime checks might result in slower application execution.
  • The -qinfo option produces or suppresses groups of informational messages.
  • The -qinitauto option initializes uninitialized automatic variables to a specific value. Setting uninitialized automatic variables to zero ensures that all automatic variables that are not explicitly initialized when declared are cleared before they are used. You can also use this option to initialize variables of real or complex type to a signaling or quiet NaN, which helps locate uninitialized variables in your program. Note that -qinitauto can increase execution time.

Quick reference link for IBM XL C/C++ for Linux, V16.1.1 for little endian distributions:
Error checking and debugging

Reference links

Join The Discussion

Your email address will not be published. Required fields are marked *