Taxonomy Icon

Linux

Introduction

This tutorial introduces you to the Linux® simple performance lock analysis tool (Linux splat) for IBM® Power Systems™. Linux splat provides a system-wide view of the different locks that are acquired, released, or actively held by different processes across a Linux system. The locks refer to the synchronization mechanisms used when multiple tasks or processes are running simultaneously. This tool can be very helpful in understanding and troubleshooting performance problems relevant to different types of locks being used across a Linux system.

When the IBM Power® server family introduced support for Linux, there was a lack of tools such as AIX curt and AIX splat on Linux distributions such as Red Hat Enterprise Linux (RHEL), SUSE Linux Enterprise Server (SLES), Ubuntu, and so on.

We developed Linux splat and Linux CPU Utilization Reporting Tool (Linux curt), which we’ll cover in a tutorial coming soon, in response to requests from developers, customers, and administrators who were used to having performance tools such as AIX splat and AIX curt available for advanced troubleshooting of lock and CPU-related performance problems.

Linux splat and Linux curt are also useful when migrating workloads to Linux because they help with troubleshooting performance bottlenecks around CPU or locks, particularly for database loads, which tend to be CPU intensive, and locks to handle synchronization among database threads. Lack of such tools on Linux made troubleshooting harder in this area.

The current iteration of Linux splat generates reports on user-level mutexes but might soon include read/write locks and condition variable reports. It uses the capabilities of the Linux perf facility to record lock-related events and process the perf format trace file generated to produce a per-function, per-task, and system-wide lock statistics report (explained later in this tutorial with examples).

Note: Even though we developed Linux splat in response to specific IBM Power Systems user requests, it will run on any Linux architecture.

Prerequisites

Using Linux splat requires a basic understanding of the Linux perf utility and working experience of an operating system such as, UNIXB.. Elementary knowledge of Python and shell scripting might be helpful but not mandatory.

For example, on a Red Hat Enterprise Linux (RHEL) system, the following versions of software are needed:

  • RHEL 7.5
  • RPM packages
  • Linux perf utility
  • Python

You can install the latest versions using the yum installer:

yum install perf

yum install python

Installing Linux splat

The code is publicly available on GitHub at https://github.com/open-power-sdk/splat. You can download it directly as a ZIP file.

Then, extract the package using the following command:

gunzip splat.zip

Or, you can clone the repository using the following command as shown in Figure 1:

git clone https://github.com/open-power-sdk/splat.git

Setting up Linux splat

Perform the following steps to set up Linux splat:

  1. Enable user access

    1. Provide read/write access to the /sys/kernel/debug/tracing folder using the following command:

      sudo mount -o remount,mode=755 /sys/kernel/debug

    2. Enable any user to collect system-wide events using the following command:

      echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid

      Or

      Run the mount.sh script to perform the steps above.

      sudo ./mount.sh

  2. Install the debuginfo package for glibc.

    For example, from the test setup used for this example, the rpm packages were installed (as root user) by running the following commands:

    yum localinstall glibc-debuginfo-2.17-222.el7.ppc64le.rpm

    yum localinstall glibc-debuginfo-common-2.17-222.el7.ppc64le.rpm

  3. Add probes.

    As a root user, enable the software-defined tracepoints on the system so that the perf utility can log them when the associated code paths are run during event recording.

    sudo ./addprobes.py

Running Linux splat

Linux splat can be run in many ways:

  • The easiest method is to record and report in a single step.

    ./splat.py --record all <command>

    where command is the command to be run including arguments. For example: ./splat.py –record all /usr/bin/ls -l /home

    The events recorded will be written to perf.data and also be processed to generate a detailed report.

  • Process an existing perf.data file.

    perf script -s ./splat.py

    Or, just run ./splat.py, which will launch the perf command internally.

  • Process a renamed perf trace file. By default, splat and perf tools look for a perf.data file.

    ./splat.py <file>

Linux splat usage examples

Linux splat uses the Linux perf command to record events into the perf.data file. This file is processed by splat to generate a report that shows:

  • A count of lock acquired attempts, acquisitions, failures, and releases
  • Average, minimum, and maximum wait and hold times of all locks

The following statistics are recorded per task and a system-wide summary across all tasks is reported:

  • Task : Task ID performing the lock operation
  • Function : Function in which the lock operation is taking place
  • Lock : Lock ID
  • Attempted : Count of lock acquisitions made
  • Failed : Count of failed attempts to lock acquisitions
  • Acquired : Count of successful lock acquisitions
  • Released : Count of locks released
  • Wait : The (minimum/maximum/average) wait time for a lock
  • Held : The (minimum/maximum/average) hold time for a lock

The following sections demonstrate two examples for using Linux splat. The simple test case shows its basic functionality, while the test case with multiple threads competing for locks shows lock acquisitions attempts (success and failures), wait/hold times, and releases per task as well as anoverall summary.

A simple test case

The splat command is run as shown below. This records all the events related to the probe for a duration of 20 seconds (as specified in the command argument: sleep 20). A perf format perf.data trace is generated and is processed by splat to create a report as shown in the following output. The list of statistics (as described above) is displayed for all the locks per task.

./splat.py --record all sleep 20

Sample splat command output

Task [    37569]
  Function "start_thread+0x104"
            lock    attempted       failed     acquired     released
        100200d0            5            2            3            0
                         wait     wait avg     wait min     wait max
                     0.009284     0.003095     0.000970     0.007344
                         held     held avg     held min     held max
                     0.000000     0.000000     0.000000     0.000000
  Function "critSection+0xe0"
            lock    attempted       failed     acquired     released
        100200d0            0            0            0            3
                         wait     wait avg     wait min     wait max
                     0.000000     0.000000     0.000000     0.000000
                         held     held avg     held min     held max
                     0.332536     0.110845     0.100884     0.122726
  Function "critSection+0x2c"
            lock    attempted       failed     acquired     released
        10020090            1            0            1            0
                         wait     wait avg     wait min     wait max
                     0.002226     0.002226     0.002226     0.002226
                         held     held avg     held min     held max
                     0.000000     0.000000     0.000000     0.000000
  Function "critSection+0x70"
            lock    attempted       failed     acquired     released
        10020090            0            0            0            1
                         wait     wait avg     wait min     wait max
                     0.000000     0.000000     0.000000     0.000000
                         held     held avg     held min     held max
                     0.006928     0.006928     0.006928     0.006928


Task [    37570]
  Function "start_thread+0x104"
            lock    attempted       failed     acquired     released
        100200d0         6479         6476            3            0
                         wait     wait avg     wait min     wait max
                     0.003192     0.001064     0.001048     0.001096
                         held     held avg     held min     held max
                     0.000000     0.000000     0.000000     0.000000
  Function "critSection+0xe0"
            lock    attempted       failed     acquired     released
        100200d0            0            0            0            3
                         wait     wait avg     wait min     wait max
                     0.000000     0.000000     0.000000     0.000000
                         held     held avg     held min     held max
                  1000.124214   333.374738     0.031022  1000.043584
  Function "critSection+0x2c"
            lock    attempted       failed     acquired     released
        10020090            1            0            1            0
                         wait     wait avg     wait min     wait max
                     0.002312     0.002312     0.002312     0.002312
                         held     held avg     held min     held max
                     0.000000     0.000000     0.000000     0.000000
  Function "critSection+0x70"
            lock    attempted       failed     acquired     released
        10020090            0            0            0            1
                         wait     wait avg     wait min     wait max
                     0.000000     0.000000     0.000000     0.000000
                         held     held avg     held min     held max
                     0.001238     0.001238     0.001238     0.001238

Task [      ALL]
            lock     acquired       failed
     10024c810c0            7            0
          waited      minWait      maxWait      avgWait
           0.009        0.001        0.002        0.001
            held      minHeld      maxHeld      avgHeld
           0.438        0.001        0.419        0.063

            lock     acquired       failed
        100200d0            6         6478
          waited      minWait      maxWait      avgWait
           0.012        0.001        0.007        0.002
            held      minHeld      maxHeld      avgHeld
        1000.457        0.031     1000.044      166.743

            lock     acquired       failed
    3fff800124f8            4            0
          waited      minWait      maxWait      avgWait
           0.005        0.001        0.002        0.001
            held      minHeld      maxHeld      avgHeld
           0.015        0.001        0.011        0.004

            lock     acquired       failed
    3fff80012818            4            0
          waited      minWait      maxWait      avgWait
           0.005        0.001        0.002        0.001
            held      minHeld      maxHeld      avgHeld
           0.011        0.001        0.006        0.003

            lock     acquired       failed
    3fffae030908            2            0
          waited      minWait      maxWait      avgWait
           0.003        0.001        0.001        0.001
            held      minHeld      maxHeld      avgHeld
           0.020        0.001        0.019        0.010

            lock     acquired       failed
        10020090            2            0
          waited      minWait      maxWait      avgWait
           0.005        0.002        0.002        0.002
            held      minHeld      maxHeld      avgHeld
           0.008        0.001        0.007        0.004

Test case with threads competing for locks

The command output also illustrates a slightly more complicated scenario because it has a few threads simultaneously looping and contending for a shared lock. In this scenario, the threads try to compete for lock acquisition and then get into a loop where they use condition variables to wait and signal the lock. The task procedure runs in a loop and the overall process is active for about 10 seconds before terminating.

In the command output, there are two threads (37569 and 37570) attempting, acquiring, and releasing a lock (100200d0) multiple times (as this happens in a loop). You can also see the failed attempts for the lock. The total hold and wait times for acquiring the lock is high, which also explains why there is a high value for the lock attempted statistics. These fields can be observed in the individual task reports and collation of minimum, maximum, and average values can be seen in the command output.

Summary

We believe that Linux splat is a very helpful tool for developers, administrators, and clients too allow them to effectively troubleshoot lock-related performance problems they might encounter when migrating applications from AIX to Linux systems.

Now that Linux splat is also available publicly on GitHub, we encourage you to explore and utilize it. We would be very grateful to hear your feedback, comments, suggestions, enhancement ideas, and any other issues you may find.

Linux curt, which we mentioned earlier, is another interesting performance tool that is available on GitHub. We plan to cover Linux curt in a future tutorial. Until then, review the README to learn more.