Introduction

In modern computing, performance plays a vital role. However, identifying and resolving performance-related issues are quite challenging. The IBM® AIX® operating system provides a handful of tools to gather performance data from the running processes. tprof is one such tool that provides instruction-level performance data of the processes. This tutorial helps users to identify the line numbers in the source files of the process which has performance issues.

Line numbers can be identified using tprof and annotation listing files. The -L option in tprof is used to generate the annotated listing file (.alst). This does not require access to source files or the binary compiled with the -g option.

Learning objectives

After completing this tutorial, you will be able to identify performance issues for processes that consume a large number of CPU ticks at a granular level by using annotation listing files in the tprof command.

Prerequisites

The following prerequisites are required to identify the line numbers in the source files that are causing performance issues:

  • Knowledge of using the tprof command
  • Knowledge of C source code compilation
  • Knowledge of annotation listing files

Estimated time

It would approximately take about 30 minutes to identify the most CPU consuming instruction of the process and the line number of the instruction in the source file using annotation listing in tprof.

Listing files

A listing file is a compiler output file (with a .lst suffix) that contains information about a particular compilation. As a debugging aid, a compiler listing is useful for determining what has gone wrong in a compilation. For example, any diagnostic messages reported during compilation are written to the listing.

Listing information is organized in different sections. A listing contains a header section and a combination of other sections, depending on other options in effect.

When the -qlist option is used, the .lst file contains the object code generated by the compiler. This section is useful for diagnosing execution-time problems.

Steps

Generating a listing file

To generate a listing file, the source should be compiled with the -qlist option.

The following sample program (testant.c) is used for illustration purposes in this tutorial.

int main() {

int x = 0;

x = 100 + 200 + 300;

while(1) {

for(int i = 0; i < x ; i++) {

x = x * 2;

}

float xr = xr * xr;

}

}

Use the following xlc command to compile the testant.c file and to generate the list files.

/usr/vac/bin/cc tmp.c -o testant -qlist

After successful compilation, testant.lst file along with the binary file is generated as shown in the snippet below:

listing file generated
IBM XL C for AIX, Version 13.1.0.2 tmp.c 07/12/18 05:31:23 (C)
>>>>> OPTIONS SECTION <<<<<
IBM XL C for AIX, Version 13.1.0.2
***   Command Line Invocation ***
***   Options In Effect   ***
LIST                NORO                NOROCONST          
LANGLVL=EXTENDED:NOUCS:NOTEXTAFTERENDIF
>>>>> FILE TABLE SECTION <<<<<
                                       FILE CREATION        FROM
FILE NO   FILENAME                    DATE       TIME       FILE    LINE
     0    tmp.c                       07/12/18   05:31:05
>>>>> COMPILATION EPILOGUE SECTION <<<<<
IBM XL C/C++ Compilers Summary of Diagnosed Conditions
TOTAL   UNRECOVERABLE  SEVERE       ERROR     WARNING    INFORMATIONAL
               (U)       (S)         (E)        (W)          (I)
    0           0         0           0          0            0
Source records read.........................................      13
1501-008  Compilation successful for file tmp.c. Object file created.
>>>>> OBJECT SECTION, NO OPTIMIZATION <<<<<
 GPR's set/used:   -s-s s--- ---- ----  ---- ---- ---- ----
 FPR's set/used:   -s-- ---- ---- ----  ---- ---- ---- ----
 CCR's set/used:   s--- ----
     | 000000                           PDEF     main
    1|                                  PROC      
    0| 000000 stwu     9421FFA0   1     ST4U      gr1,#stack(gr1,-96)=gr1
    2| 000004 addi     38600000   1     LI        gr3=0
    2| 000008 stw      90610040   1     ST4A      x(gr1,64)=gr3ion is unloaded and reloaded again 
    3| 00000C addi     38600258   1     LI        gr3=600
    3| 000010 stw      90610040   1     ST4A      x(gr1,64)=gr3
    3|                              CL.1:
    6|                              CL.4:
    7| 000014 addi     38600000   1     LI        gr3=0
    7| 000018 stw      90610050   1     ST4A      i(gr1,80)=gr3
    7| 00001C lwz      80810040   1     L4A       gr4=x(gr1,64)
    7| 000020 cmpw     7C032000   2     C4        cr0=gr3,gr4
    7| 000024 bc       40800028   1     BF        CL.8,cr0,0x1/lt,
    7|                              CL.7:
    8| 000028 lwz      80610040   1     L4A       gr3=x(gr1,64)
    8| 00002C rlwinm   5463083C   2     SLL4      gr3=gr3,1
    8| 000030 stw      90610040   1     ST4A      x(gr1,64)=gr3
    7| 000034 lwz      80610050   1     L4A       gr3=i(gr1,80)
    7| 000038 addi     38630001   2     AI        gr3=gr3,1
    7| 00003C stw      90610050   1     ST4A      i(gr1,80)=gr3
    7| 000040 lwz      80810040   1     L4A       gr4=x(gr1,64)
    7| 000044 cmpw     7C032000   2     C4        cr0=gr3,gr4
    7| 000048 bc       4180FFE0   1     BT        CL.7,cr0,0x1/lt,
    7|                              CL.8:
   10| 00004C lfs      C021004C   1     LFS       fp1=xr(gr1,76)
   10| 000050 fmuls    EC210072   1     MFS       fp1=fp1,fp1,fcr
   10| 000054 stfs     D021004C   1     STFS      xr(gr1,76)=fp1
    6| 000058 b        4BFFFFBC   0     B         CL.4
    6|                              CL.5:
   13| 00005C addi     38600000   1     LI        gr3=0
   13| 000060 b        48000004   0     B         CL.9,-1
   13|                              CL.2:
   13|                              CL.9:
   13| 000064 addi     38210060   1     AI        gr1=gr1,96
   13| 000068 bclr     4E800020   0     BA        lr
     |               Tag Table
     | 00006C        00000000 00002240 80000001 0000006C 00046D61 696E
     |               Instruction count           27
     |               Straight-line exec time     28

Generating an annotated listing file using tprof

To generate the annotated listings, use the -L <obj1,obj2> option of the tprof command.

tprof -ukesz -L a.lst -r testant

This command generates testant.a.lst.alst and testant.tprof files as shown in the snippet below:

        Total % for main = 99.45
      -      | 000000                           PDEF     main
      -     1|                                  PROC      
      -     1|                                  PROC      
      -     0| 000000 stwu     9421FFA0   1     ST4U      gr1,#stack(gr1,-96)=gr1
      -     2| 000004 addi     38600000   1     LI        gr3=0
      -     2| 000008 stw      90610040   1     ST4A      x(gr1,64)=gr3
      -     3| 00000C addi     38600258   1     LI        gr3=600
      -     3| 000010 stw      90610040   1     ST4A      x(gr1,64)=gr3
      -     3|                              CL.1:
      -     6|                              CL.4:
      -     7| 000014 addi     38600000   1     LI        gr3=0
      -     7| 000018 stw      90610050   1     ST4A      i(gr1,80)=gr3
      -     7| 00001C lwz      80810040   1     L4A       gr4=x(gr1,64)
      -     7| 000020 cmpw     7C032000   2     C4        cr0=gr3,gr4
  95.16  7| 000024 bc       40800028   1     BF        CL.8,cr0,0x1/lt,
(95.16) 7|                              CL.7:
      -     8| 000028 lwz      80610040   1     L4A       gr3=x(gr1,64)
      -     8| 00002C rlwinm   5463083C   2     SLL4      gr3=gr3,1
      -     8| 000030 stw      90610040   1     ST4A      x(gr1,64)=gr3
      -     7| 000034 lwz      80610050   1     L4A       gr3=i(gr1,80)
      -     7| 000038 addi     38630001   2     AI        gr3=gr3,1
      -     7| 00003C stw      90610050   1     ST4A      i(gr1,80)=gr3
      -     7| 000040 lwz      80810040   1     L4A       gr4=x(gr1,64)
      -     7| 000044 cmpw     7C032000   2     C4        cr0=gr3,gr4
      -     7| 000048 bc       4180FFE0   1     BT        CL.7,cr0,0x1/lt,
      -     7|                              CL.8:
   0.20   10| 00004C lfs      C021004C   1     LFS       fp1=xr(gr1,76)
      -    10| 000050 fmuls    EC210072   1     MFS       fp1=fp1,fp1,fcr
      -    10| 000054 stfs     D021004C   1     STFS      xr(gr1,76)=fp1
   4.09    6| 000058 b        4BFFFFBC   0     B         CL.4
( 4.29)   6|                              CL.5:
      -    13| 00005C addi     38600000   1     LI        gr3=0
      -    13| 000060 b        48000004   0     B         CL.9,-1
      -    13|                              CL.2:
      -    13|                              CL.9:
      -    13| 000064 addi     38210060   1     AI        gr1=gr1,96
-    13| 000068 bclr     4E800020   0     BA        lr

Listing annotation option (-L) in tpcrof produces .alst files. Each annotated listing file contains complete compiler listings for each procedure and total CPU tick count or percentage (depending on the -z option) for each procedure. Each line of the listing gives the tick count, line number, relative instruction offset, assembly mnemonic, and opcode. A snippet of the profile file generated by tprof is shown below:

Configuration information
=========================
System: AIX 7.2 Node: alp096-lp31 Machine: 00F967964C00
Tprof command was: 
    tprof -ukesz -L a.lst -r testant
Tprof command used to produce input files was: 
    tprof -E -ukes -ANI -r testant -x sleep 20
Trace command was: 
    /usr/bin/trace -ad -M -L 1073741312 -T 500000 -j 00A,001,002,003,38F,005,006,134,210,139,5A2,5A5,465,2FF,5D8, -o testant.trc    
Total Samples = 2006
Traced Time = 20.02s (out of a total execution time of 20.02s)
Performance Monitor based reports:
Processor name: POWER8
Monitored event: Processor cycles
Sampling interval: 10ms
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Process                                FREQ  Total Kernel   User Shared  Other
=======                         ====  ===== ======   ==== ======  =====
./testant                                 1   1998   3   1995   0      0
/usr/sbin/cron                         1      2      2      0      0      0
wait                                        2      2      2      0      0      0
/usr/bin/sh                              1      1      1      0      0      0
/usr/bin/tprof                          1      1      1      0      0      0
/usr/sbin/dumpctrl                  1      1      1      0      0      0
xmgc                                      1      1      1      0      0      0
=======                                ====  ===== ======   ==== ======  =====
Total                                     8   2006     11   1995      0      0
Process                           PID      TID  Total Kernel   User Shared  Other
=======                   ===      ===  ===== ======   ==== ======  =====
./testant                      8847652  9961911   1998      3   1995      0      0
/usr/sbin/cron              8061200 18284881      2      2      0      0      0
/usr/bin/tprof               5374428 16253427      1      1      0      0      0
/usr/bin/sh                   9961786 18809279      1      1      0      0      0
/usr/sbin/dumpctrl       8061200 18284881      1      1      0      0      0
xmgc                           852254  1376559         1      1      0      0      0
wait                             917792  1442097         1      1      0      0      0
wait                             260          261               1      1      0      0      0
=======                   ===      ===  ===== ======   ==== ======  =====
Total                                    2006     11   1995      0      0

  Total Ticks For All Processes (USER) = 1995
User Process                                   Ticks    %    Address  Bytes
=============                                  ===== ======  =======  =====
./testant                                       1995  99.45 10000128    380
  Profile: ./testant

  Total Ticks For All Processes (./testant) = 1995

Subroutine                 Ticks    %   Source                Address  Bytes
==========                 ===== ====== ======                =======  =====
.main                       1995  99.45 tmp.c                     258     6c

  Total Ticks For All Processes (KERNEL) = 8

Subroutine                               Ticks    %   Source                Address  Bytes
==========                 ===== ====== ======                =======  =====
.waitproc_find_run_queue        1        0.05 rnel/proc/dispatch.c    af580    a5c
.convert_tb                               1        0.05 ernel/proc/curtime.c   115000    2b8
.clock                                        1        0.05 /kernel/proc/clock.c   221580    594
sc_trace_end_fixup                   1        0.05 low.s                    4450      1
.tod_lock_read                          1        0.05 /kernel/proc/clock.c   2220e0    198
.shmfreews                               1        0.05 s/kernel/vmm/vmshm.c   806280    290
.v_insscb                                   1       0.05 rnel/vmm/v_scblist.c   34eca0    4d8
.xmgc_backend_free_heap        1       0.05 /kernel/alloc/xmgc.c   710c80    3d8
  Total Ticks For All Processes (HYPERVISOR) = 3
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Interpreting annotated listings

In the snippets above, you can see that the testant process consumes 1998 ticks in user-space, and main subroutine consumes 1995 ticks. The line numbers that are consuming these ticks can be identified in the generated annotation list.

95.16     7| 000024 bc       40800028   1     BF        CL.8,cr0,0x1/lt,

(95.16)     7|                              CL.7:

95.16% of the ticks are seen at the line number 7.

Summary

Generating annotated listing files can help you analyze performance issues by identifying specific instructions that consumes greater number of CPU ticks.