Introduction
In modern computing, performance plays a vital role. However, identifying and resolving performance-related issues are quite challenging. The IBM® AIX® operating system provides a handful of tools to gather performance data from the running processes. tprof is one such tool that provides instruction-level performance data of the processes. This tutorial helps users to identify the line numbers in the source files of the process which has performance issues.
Line numbers can be identified using tprof and annotation listing files. The -L
option in tprof is used to generate the annotated listing file (.alst). This does not require access to source files or the binary compiled with the -g
option.
Learning objectives
After completing this tutorial, you will be able to identify performance issues for processes that consume a large number of CPU ticks at a granular level by using annotation listing files in the tprof
command.
Prerequisites
The following prerequisites are required to identify the line numbers in the source files that are causing performance issues:
- Knowledge of using the
tprof
command - Knowledge of C source code compilation
- Knowledge of annotation listing files
Estimated time
It would approximately take about 30 minutes to identify the most CPU consuming instruction of the process and the line number of the instruction in the source file using annotation listing in tprof
.
Listing files
A listing file is a compiler output file (with a .lst suffix) that contains information about a particular compilation. As a debugging aid, a compiler listing is useful for determining what has gone wrong in a compilation. For example, any diagnostic messages reported during compilation are written to the listing.
Listing information is organized in different sections. A listing contains a header section and a combination of other sections, depending on other options in effect.
When the -qlist
option is used, the .lst file contains the object code generated by the compiler. This section is useful for diagnosing execution-time problems.
Steps
Generating a listing file
To generate a listing file, the source should be compiled with the -qlist
option.
The following sample program (testant.c) is used for illustration purposes in this tutorial.
int main() {
int x = 0;
x = 100 + 200 + 300;
while(1) {
for(int i = 0; i < x ; i++) {
x = x * 2;
}
float xr = xr * xr;
}
}
Use the following xlc command to compile the testant.c file and to generate the list files.
/usr/vac/bin/cc tmp.c -o testant -qlist
After successful compilation, testant.lst file along with the binary file is generated as shown in the snippet below:
listing file generated
IBM XL C for AIX, Version 13.1.0.2 tmp.c 07/12/18 05:31:23 (C)
>>>>> OPTIONS SECTION <<<<<
IBM XL C for AIX, Version 13.1.0.2
*** Command Line Invocation ***
*** Options In Effect ***
LIST NORO NOROCONST
LANGLVL=EXTENDED:NOUCS:NOTEXTAFTERENDIF
>>>>> FILE TABLE SECTION <<<<<
FILE CREATION FROM
FILE NO FILENAME DATE TIME FILE LINE
0 tmp.c 07/12/18 05:31:05
>>>>> COMPILATION EPILOGUE SECTION <<<<<
IBM XL C/C++ Compilers Summary of Diagnosed Conditions
TOTAL UNRECOVERABLE SEVERE ERROR WARNING INFORMATIONAL
(U) (S) (E) (W) (I)
0 0 0 0 0 0
Source records read......................................... 13
1501-008 Compilation successful for file tmp.c. Object file created.
>>>>> OBJECT SECTION, NO OPTIMIZATION <<<<<
GPR's set/used: -s-s s--- ---- ---- ---- ---- ---- ----
FPR's set/used: -s-- ---- ---- ---- ---- ---- ---- ----
CCR's set/used: s--- ----
| 000000 PDEF main
1| PROC
0| 000000 stwu 9421FFA0 1 ST4U gr1,#stack(gr1,-96)=gr1
2| 000004 addi 38600000 1 LI gr3=0
2| 000008 stw 90610040 1 ST4A x(gr1,64)=gr3ion is unloaded and reloaded again
3| 00000C addi 38600258 1 LI gr3=600
3| 000010 stw 90610040 1 ST4A x(gr1,64)=gr3
3| CL.1:
6| CL.4:
7| 000014 addi 38600000 1 LI gr3=0
7| 000018 stw 90610050 1 ST4A i(gr1,80)=gr3
7| 00001C lwz 80810040 1 L4A gr4=x(gr1,64)
7| 000020 cmpw 7C032000 2 C4 cr0=gr3,gr4
7| 000024 bc 40800028 1 BF CL.8,cr0,0x1/lt,
7| CL.7:
8| 000028 lwz 80610040 1 L4A gr3=x(gr1,64)
8| 00002C rlwinm 5463083C 2 SLL4 gr3=gr3,1
8| 000030 stw 90610040 1 ST4A x(gr1,64)=gr3
7| 000034 lwz 80610050 1 L4A gr3=i(gr1,80)
7| 000038 addi 38630001 2 AI gr3=gr3,1
7| 00003C stw 90610050 1 ST4A i(gr1,80)=gr3
7| 000040 lwz 80810040 1 L4A gr4=x(gr1,64)
7| 000044 cmpw 7C032000 2 C4 cr0=gr3,gr4
7| 000048 bc 4180FFE0 1 BT CL.7,cr0,0x1/lt,
7| CL.8:
10| 00004C lfs C021004C 1 LFS fp1=xr(gr1,76)
10| 000050 fmuls EC210072 1 MFS fp1=fp1,fp1,fcr
10| 000054 stfs D021004C 1 STFS xr(gr1,76)=fp1
6| 000058 b 4BFFFFBC 0 B CL.4
6| CL.5:
13| 00005C addi 38600000 1 LI gr3=0
13| 000060 b 48000004 0 B CL.9,-1
13| CL.2:
13| CL.9:
13| 000064 addi 38210060 1 AI gr1=gr1,96
13| 000068 bclr 4E800020 0 BA lr
| Tag Table
| 00006C 00000000 00002240 80000001 0000006C 00046D61 696E
| Instruction count 27
| Straight-line exec time 28
Generating an annotated listing file using tprof
To generate the annotated listings, use the -L <obj1,obj2>
option of the tprof
command.
tprof -ukesz -L a.lst -r testant
This command generates testant.a.lst.alst and testant.tprof files as shown in the snippet below:
Total % for main = 99.45
- | 000000 PDEF main
- 1| PROC
- 1| PROC
- 0| 000000 stwu 9421FFA0 1 ST4U gr1,#stack(gr1,-96)=gr1
- 2| 000004 addi 38600000 1 LI gr3=0
- 2| 000008 stw 90610040 1 ST4A x(gr1,64)=gr3
- 3| 00000C addi 38600258 1 LI gr3=600
- 3| 000010 stw 90610040 1 ST4A x(gr1,64)=gr3
- 3| CL.1:
- 6| CL.4:
- 7| 000014 addi 38600000 1 LI gr3=0
- 7| 000018 stw 90610050 1 ST4A i(gr1,80)=gr3
- 7| 00001C lwz 80810040 1 L4A gr4=x(gr1,64)
- 7| 000020 cmpw 7C032000 2 C4 cr0=gr3,gr4
95.16 7| 000024 bc 40800028 1 BF CL.8,cr0,0x1/lt,
(95.16) 7| CL.7:
- 8| 000028 lwz 80610040 1 L4A gr3=x(gr1,64)
- 8| 00002C rlwinm 5463083C 2 SLL4 gr3=gr3,1
- 8| 000030 stw 90610040 1 ST4A x(gr1,64)=gr3
- 7| 000034 lwz 80610050 1 L4A gr3=i(gr1,80)
- 7| 000038 addi 38630001 2 AI gr3=gr3,1
- 7| 00003C stw 90610050 1 ST4A i(gr1,80)=gr3
- 7| 000040 lwz 80810040 1 L4A gr4=x(gr1,64)
- 7| 000044 cmpw 7C032000 2 C4 cr0=gr3,gr4
- 7| 000048 bc 4180FFE0 1 BT CL.7,cr0,0x1/lt,
- 7| CL.8:
0.20 10| 00004C lfs C021004C 1 LFS fp1=xr(gr1,76)
- 10| 000050 fmuls EC210072 1 MFS fp1=fp1,fp1,fcr
- 10| 000054 stfs D021004C 1 STFS xr(gr1,76)=fp1
4.09 6| 000058 b 4BFFFFBC 0 B CL.4
( 4.29) 6| CL.5:
- 13| 00005C addi 38600000 1 LI gr3=0
- 13| 000060 b 48000004 0 B CL.9,-1
- 13| CL.2:
- 13| CL.9:
- 13| 000064 addi 38210060 1 AI gr1=gr1,96
- 13| 000068 bclr 4E800020 0 BA lr
Listing annotation option (-L) in tpcrof produces .alst files. Each annotated listing file contains complete compiler listings for each procedure and total CPU tick count or percentage (depending on the -z
option) for each procedure. Each line of the listing gives the tick count, line number, relative instruction offset, assembly mnemonic, and opcode. A snippet of the profile file generated by tprof is shown below:
Configuration information
=========================
System: AIX 7.2 Node: alp096-lp31 Machine: 00F967964C00
Tprof command was:
tprof -ukesz -L a.lst -r testant
Tprof command used to produce input files was:
tprof -E -ukes -ANI -r testant -x sleep 20
Trace command was:
/usr/bin/trace -ad -M -L 1073741312 -T 500000 -j 00A,001,002,003,38F,005,006,134,210,139,5A2,5A5,465,2FF,5D8, -o testant.trc
Total Samples = 2006
Traced Time = 20.02s (out of a total execution time of 20.02s)
Performance Monitor based reports:
Processor name: POWER8
Monitored event: Processor cycles
Sampling interval: 10ms
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Process FREQ Total Kernel User Shared Other
======= ==== ===== ====== ==== ====== =====
./testant 1 1998 3 1995 0 0
/usr/sbin/cron 1 2 2 0 0 0
wait 2 2 2 0 0 0
/usr/bin/sh 1 1 1 0 0 0
/usr/bin/tprof 1 1 1 0 0 0
/usr/sbin/dumpctrl 1 1 1 0 0 0
xmgc 1 1 1 0 0 0
======= ==== ===== ====== ==== ====== =====
Total 8 2006 11 1995 0 0
Process PID TID Total Kernel User Shared Other
======= === === ===== ====== ==== ====== =====
./testant 8847652 9961911 1998 3 1995 0 0
/usr/sbin/cron 8061200 18284881 2 2 0 0 0
/usr/bin/tprof 5374428 16253427 1 1 0 0 0
/usr/bin/sh 9961786 18809279 1 1 0 0 0
/usr/sbin/dumpctrl 8061200 18284881 1 1 0 0 0
xmgc 852254 1376559 1 1 0 0 0
wait 917792 1442097 1 1 0 0 0
wait 260 261 1 1 0 0 0
======= === === ===== ====== ==== ====== =====
Total 2006 11 1995 0 0
Total Ticks For All Processes (USER) = 1995
User Process Ticks % Address Bytes
============= ===== ====== ======= =====
./testant 1995 99.45 10000128 380
Profile: ./testant
Total Ticks For All Processes (./testant) = 1995
Subroutine Ticks % Source Address Bytes
========== ===== ====== ====== ======= =====
.main 1995 99.45 tmp.c 258 6c
Total Ticks For All Processes (KERNEL) = 8
Subroutine Ticks % Source Address Bytes
========== ===== ====== ====== ======= =====
.waitproc_find_run_queue 1 0.05 rnel/proc/dispatch.c af580 a5c
.convert_tb 1 0.05 ernel/proc/curtime.c 115000 2b8
.clock 1 0.05 /kernel/proc/clock.c 221580 594
sc_trace_end_fixup 1 0.05 low.s 4450 1
.tod_lock_read 1 0.05 /kernel/proc/clock.c 2220e0 198
.shmfreews 1 0.05 s/kernel/vmm/vmshm.c 806280 290
.v_insscb 1 0.05 rnel/vmm/v_scblist.c 34eca0 4d8
.xmgc_backend_free_heap 1 0.05 /kernel/alloc/xmgc.c 710c80 3d8
Total Ticks For All Processes (HYPERVISOR) = 3
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Interpreting annotated listings
In the snippets above, you can see that the testant process consumes 1998 ticks in user-space, and main subroutine consumes 1995 ticks. The line numbers that are consuming these ticks can be identified in the generated annotation list.
95.16 7| 000024 bc 40800028 1 BF CL.8,cr0,0x1/lt,
(95.16) 7| CL.7:
95.16% of the ticks are seen at the line number 7.
Summary
Generating annotated listing files can help you analyze performance issues by identifying specific instructions that consumes greater number of CPU ticks.