Linux interfaces for IBM POWER9 nest counters on IBM PowerVM
An introduction to Linux interfaces
System performance analysis is a requirement for understanding and tuning computer systems to perform efficiently. There is a tremendous need to address the difficult, but not uncommon, performance issues on customer systems. The 24×7 feature in IBM® POWER9™ processor-based servers provide the facility to continuously collect large numbers of hardware performance metrics efficiently and accurately. This tutorial introduces the nest hardware performance monitoring counters and its Linux perf interface in an IBM PowerVM® partition. The tutorial also introduces the Performance Co-Pilot (PCP) tool and its usage to collect the POWER9 nest performance monitoring counter data through the hv_24x7 perf interface.
Socket-level resource information using POWER9 nest counters
IBM POWER9 processors implement nest Performance Monitoring Unit (PMUs), which enable measurement of socket-level resource utilization. Each nest PMU has dedicated performance monitoring counters and hardware events. Unlike traditional processor PMU events, nest PMU events focus on data that go off-core. IBM POWER9 processors also implement an accumulation logic in hardware, which is used to update event counter data from nest units to memory periodically.
Linux perf interface for nest counter data collection
IBM POWER9 hardware gathers the socket-level performance data and stores the data in buffers that are allocated and managed by the hypervisor. The interface which is used to harvest this data is known as hv_24x7. The hv_24x7 interface collects the performance data from hypervisor memory through hypervisor call (HCALL). The hv_24x7 interface is integrated with the Linux perf tool. Thus, the user or an application can obtain the nest counter data through perf commands.
To get the list of supported perf events, you need to run the following
perf list command.
Figure 1. hv_24x7 events listed with
perf list command
Prerequisites for performance data collection
Unless a logical partition (LPAR) is configured to own all system resources, rights must be explicitly granted to collect system-wide data to prevent LPARs from obtaining data about other partitions running on a system.
To enable requests to collect information across all system resources, in the Hardware Management Console (HMC), click General Properties.
Figure 2. General Properties in the HMC menu
On the General page, click Advanced and select the Enable Performance Information Collection check box.
Figure 3. General page with the Enable Performance Information Collection option
And save the changes by clicking on ‘Save’ button.
In case the Enable Performance Information Collection check box is not selected in HMC, while monitoring the nest counter data using perf tool, a not supported message will be displayed, instead of the performance counter data.
To monitor an event, run the following
perf stat -e hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=7/ -e hv_24x7/PM_PHB0_CYC,chip=7/ -C 0 -I 1000 sleep 100
Figure 4. Example perf stat output of the hv_24x7 events
With the pre-scale of 256, power bus frequency is 2.39 GHz. Nest counters are programmed with a pre-scale count because the nest PMU counters are smaller in width. The following table lists the pre-scale value for each unit that needs to be multiplied with the counter value to get the final count.
Table 1. Pre-scale values for nest units
Nest performance monitoring data are socket-level resource utilization metrics and it is required to have root or admin-level privilege to collect these data. But if you want some trusted users to gather the nest performance data, PCP could be a possible alternative.
PCP is a framework for system performance and analysis that:
- Uses a distributed architecture to collect, monitor, and analyze system metrics.
- Has live monitoring and statistical prediction features.
- Supports APIs such as C, Python, and Perl.
PCP setup and usage
You can install PCP and other required tools using the pcp, libpcp-devel, pcp-pmda-perfevent, pcp-system-tools, and python3-pcp packages.
Performance Metrics Collector Daemon (pmcd)
The pmcd daemon collects performance metrics on a system. There must be an instance of pmcd running on a system to collect performance metrics.
In the pmcd configuration file (/etc/pcp/pmcd/pmcd.conf), you can perform the following tasks:
- Configure agents to collect specific events.
- Configure access control lists for hosts and users to allow or restrict actions such as store, fetch, and so on.
Run the following command to enable the pmcd service:
# systemctl enable pmcd
Run the following command to start the pmcd service:
# systemctl start pmcd
Run the following command to check the status of the pmcd service:
# systemctl status pmcd
When the pmcd daemon is not running, the pcp command fails, and the following error is displayed:
# pcp pcp-summary: Cannot connect to PMCD on host "local:": Connection refused
To collect the hv_24x7 events, you need the perfevent Performance Metric Domain Agent (PMDA).
To install the perfevent PMDA, complete the following steps:
- Change the directory to /var/lib/pcp/pmdas/perfevent.
# ./Install PMCD should communicate with the perfevent daemon via a pipe or a socket? [pipe] Updating the Performance Metrics Name Space (PMNS) ... Terminate PMDA if already installed ... Updating the PMCD control file, and notifying PMCD ... Check perfevent metrics have appeared ... 1285 metrics and 1027 values
Restart the pmcd service.
# systemctl restart pmcd
View the installed PMDAs using the
# pcp Performance Co-Pilot configuration on linux-l9tb: platform: Linux linux-l9tb 4.12.14-32-default #1 SMP Thu Feb 14 12:19:57 UTC 2019 (e1536c0) ppc64le hardware: 64 cpus, 2 disks, 2 nodes, 15111MB RAM timezone: EDT+4 services: pmcd pmcd: Version 4.3.0-1, 8 agents pmda: root pmcd proc xfs linux mmv kvm jbd2 perfevent
View information about the perf hv_24x7 events
To collect the perf related data, you can run the
pmprobe and the
pmval PCP commands.
pmprobe command gives the available list of performance metrics through PCP facilities.
# pmprobe | grep perfevent perfevent.version perfevent.active perfevent.hwcounters.* perfevent.derived
The configuration file, /var/lib/pcp/pmdas/perfevent/perfevent.conf, contains details about the hv_24x7 events that are supported. The following hv_24x7 events are supported:
- hv_24x7.PM_PB_CYC chip:1
- hv_24x7.PM_MBA0_CLK_CYC chip:0
To see the values from the supported hv_24x7 events, run the
The output of the
pmval command displays the values of provided performance metric name.
# pmval perfevent.hwcounters.hv_24x7.PM_PB_CYC.value # pmval perfevent.hwcounters.hv_24x7.PM_MBA0_CLK_CYC.value
Refer to the following example output of the hv_24x7.PM_PB_CYC.value:
# pmval perfevent.hwcounters.hv_24x7.PM_PB_CYC.value metric: perfevent.hwcounters.hv_24x7.PM_PB_CYC.value host: linux-l9tb semantics: cumulative counter (converting to rate) units: count (converting to count / sec) samples: all cpu0 cpu1 cpu2 cpu3 9.370E+06 9.370E+06 9.370E+06 9.370E+06
Refer to the following example output of the hv_24x7.PM_MBA0_CLK_CYC event:
# pmval perfevent.hwcounters.hv_24x7.PM_MBA0_CLK_CYC.value metric: perfevent.hwcounters.hv_24x7.PM_MBA0_CLK_CYC.value host: linux-l9tb semantics: cumulative counter (converting to rate) units: count (converting to count / sec) samples: all cpu0 cpu1 cpu2 cpu3 0.0 0.0 0.0 0.0
When you try to run an unsupported event, the message, “No values available” is displayed in the output of the command.
# pmval perfevent.hwcounters.hv_24x7.CPM_CS_FROM_L2_L3_A_LDATA.value metric: perfevent.hwcounters.hv_24x7.CPM_CS_FROM_L2_L3_A_LDATA.value host: linux-l9tb semantics: cumulative counter (converting to rate) units: count (converting to count / sec) samples: all No values available No values available
To obtain the hv_24x7 values by using PCP Python APIs, complete the following procedure:
Import the required PCP API classes:
from pcp import pmapi import cpmapi as c_api
Create the fetchgroup object:
fg = pmapi.fetchgroup(c_api.PM_CONTEXT_HOST, 'local:')
Add the required event:
PM_PB_CYC = fg.extend_indom('perfevent.hwcounters.hv_24x7.PM_PB_CYC.value')
Get the samples
Because this event values are ‘cumulative’ in nature, fetch again.
To print the values, multiply with the pre-scale value. Refer to the pre-scale table.
for icode, iname, value in PM_PB_CYC(): print(" %s: %7.2f * 256" % (iname, value()), end='') print("\n")
Example output of above script:
cpu0: 7806515.00 cpu1: 7806515.00 cpu2: 7806515.00 cpu3: 7806515.00