Kubernetes with OpenShift World Tour: Get hands-on experience and build applications fast! Find a workshop!

Improve application workload performance with SMC-R on IBM AIX

Introduction

The growing amount of network traffic demands segregation of data based on the criticality of information on TCP/IP networks. SMC-R is a protocol that uses the existing TCP/IP socket APIs to allow direct memory access for data communication to gain performance benefits such as, CPU resource savings and high throughput. This tutorial describes how to configure SMC-R on AIX client and server systems running critical customer workloads and illustrates the statistical improvements in performance that’s possible with SMC-R.

Prerequisites

This tutorial provides step-by-step instructions for configuring an application workload over SMC-R. To complete the steps you should have access to the following hardware and software:

  • Two AIX logical partitions on an IBM Power Systems server that have network adapters with RDMA capability, as discussed in the client/server model section below.

  • AIX Version 7.1 with the 7100-02 Technology level, or later.

  • Database application workload of your choice. For this tutorial we used the Oracle Automated Stress Test (OAST) workload, however, the steps are the same for other workloads and you will see similar performance improvements. Please note that the workload configuration will be different based on the workload type and pattern.

Estimated time

It takes around 30 minutes for end-to-end SMC-R configurations and the workload must be run for a minimum of 4 hours to observe the performance improvements.

SMC-R configurations in the client/server model

The SMC-R protocol uses an existing socket’s API to allow access to Remote Direct Memory Access (RDMA) transports. In order to use RDMA, you need network adapters that have RDMA capability (For example, Mellanox Connect X family of adapters such as 40 GbE RoCE and 100 GbE RoCE) connected to Mellanox switches. The SMC-R layer resides below the socket layer and directs data traffic for TCP connections between connected peers over the RDMA over Converged Ethernet (RoCE) fabric rather than over a TCP connection. The TCP/IP stack, with its requirements for fragmentation, packetization, and so on is bypassed and the application data is moved between peers using RDMA. An SMC-R link is a logical point-to-point link using reliably connected queue pairs between TCP/IP stack peers over a RoCE fabric as illustrated below.

Figure 1. SMC-R communication flow between client and server systems
fig1

Steps

Enable SMC-R between the client and server systems and configure the attributes

  1. For SMC-R support in AIX, install the ofed.smcr fileset and specify the interface and port range to be used in the communication.

    # lslpp -l *smcr*
    Fileset                      Level  State      Description
    ----------------------------------------------------------------------------
    Path: /usr/lib/objrepos
    ofed.smcr.rte             7.2.3.15  COMMITTED  SMCR
    
    Path: /etc/objrepos
    ofed.smcr.rte             7.2.3.15  COMMITTED  SMCR
    
  2. After the installation of fileset, enable SMC-R using the following command,

    # mkdev -c tcpip -t smcr
    smcr0 Available
    
  3. Notice that the SMC-R device is available after it is installed and enabled.

    # lsdev|grep smcr
    smcr0      Available             AIX SMCR Device Extension
    

    Refer to the following default attributes of the SMC-R device:

    # lsattr -El smcr0
    conns_per_lg    16          Number of connections per link group              True
    enabled         0           SMCR Enabled                                      True
    init_snd_pools  2           Number of send buffer pools to allocate quickly   True
    ip_addr_list                IP Address List                                   True
    max_memory      512         Max Memory in MB                                  True
    port_range      0           TCP Port Range                                    True
    rx_intr_packets 0           Number of packets to process in interrupt context True
    tx_intr_cnt     128         Tx Interrupt event coalesce counter               True
    tx_intr_time    10000       Tx Interrupt event coalesce timer (microseconds)  True
    
  4. To set the attributes for SMC-R communication, enable SMC-R and configure the IP addresses and port range for communication.

    1. Specify the IP address to be used for SMC-R communication.

      # chdev -l smcr0 -a ip_addr_list=any
      smcr0 changed
      

      Note: Multiple IP addresses (both IPv4 and IPv6) can be given as comma separated values.

    2. Enable SMC-R over the device.

      # chdev -l smcr0 -a enabled=1
      smcr0 changed
      
    3. Specify the port range to be used for SMC-R communication.

      # chdev -l smcr0 -a port_range=0-50000
      smcr0 changed
      
  5. Repeat the above configurations steps with an appropriate IP address on the peer node (client) used in communication.

    You can verify the modified configurations using the lsattr command.

    # lsattr -El smcr0
    conns_per_lg    16          Number of connections per link group              True
    enabled         1           SMCR Enabled                                      True
    init_snd_pools  2           Number of send buffer pools to allocate quickly   True
    ip_addr_list    any         IP Address List                                   True
    max_memory      512         Max Memory in MB                                  True
    port_range      0-50000     TCP Port Range                                    True
    rx_intr_packets 0           Number of packets to process in interrupt context True
    tx_intr_cnt     128         Tx Interrupt event coalesce counter               True
    tx_intr_time    10000       Tx Interrupt event coalesce timer (microseconds)  True
    

Verify communication between the client and server systems over SMC-R

Verify that the communication between client and server systems is happening over SMC-R by using the following commands,

  1. Enter the rping command to verify the RDMA communication between the client and the server machines by following these steps:

    1. Initiate the server with the -s option and bind the IP address of the RoCE interface.

      # rping -s -v -a 10.10.10.2 -S 28 -C 10
      
    2. Initiate rping from the client system with the -c option binding to the same server IP address.

      # rping -c -v -a 10.10.10.2 -S 28 -C 10
      ping data: rdma-ping-0: ABCDEFGHIJKLMN
      ping data: rdma-ping-1: BCDEFGHIJKLMNO
      ping data: rdma-ping-2: CDEFGHIJKLMNOP
      ping data: rdma-ping-3: DEFGHIJKLMNOPQ
      ping data: rdma-ping-4: EFGHIJKLMNOPQR
      ping data: rdma-ping-5: FGHIJKLMNOPQRS
      ping data: rdma-ping-6: GHIJKLMNOPQRST
      ping data: rdma-ping-7: HIJKLMNOPQRSTU
      ping data: rdma-ping-8: IJKLMNOPQRSTUV
      ping data: rdma-ping-9: JKLMNOPQRSTUVW
      
    3. Verify the following messages on the server console to confirm that RDMA communication is successful between client and server systems.

      server ping data: rdma-ping-0: ABCDEFGHIJKLMN
      server ping data: rdma-ping-1: BCDEFGHIJKLMNO
      server ping data: rdma-ping-2: CDEFGHIJKLMNOP
      server ping data: rdma-ping-3: DEFGHIJKLMNOPQ
      server ping data: rdma-ping-4: EFGHIJKLMNOPQR
      server ping data: rdma-ping-5: FGHIJKLMNOPQRS
      
  2. Verify the Ethernet statistics of the SMC-R adapter using the entstat command.

    # entstat -d smcr0
    -------------------------------------------------------------
    ETHERNET STATISTICS (smcr0) :
    Device Type: IBM Shared Memory Channel (SMCR) Psuedo-Adapter
    Elapsed Time: 0 days 0 hours 0 minutes 0 seconds
    
    IBM Shared Memory Channel (SMCR)
    Psuedo-Adapter Specific Statistics:
    ------------------------------------------------
    
    Active Link Groups: 0
    
    TCP Payload Bytes Sent: 826
    TCP Payload Bytes Recv: 84
    
    CLC Messages Sent: 3
    CLC Messages Recv: 6
    
    LLC Messages Sent: 789628
    LLC Messages Recv: 789624
    
    TCP Failback Count: 0
    
    Active TCP connections using SMC-R: 0
    
    Total amount of memory allocated on send/TX side : 34865152 bytes
    

    Note: After establishing TCP connections such as FTP, we can see that the number of active TCP connections using SMC-R is increasing.

  3. Verify the link group information using the kdb commands.

    1. Run the following commands to check the logical groups:

      # kdb
      
      START              END <name>
      0000000000001000 00000000059E0000 start+000FD8
      F00000002FF47600 F00000002FFE1000 __ublock+000000
      000000002FF22FF4 000000002FF22FF8 environ+000000
      000000002FF22FF8 000000002FF22FFC errno+000000
      F100100A00000000 F100100A10000000 pvproc+000000
      F100100A10000000 F100100A18000000 pvthread+000000
      read vscsi_scsi_ptrs OK, ptr = 0xF1000000C0151E90
      
      (0)> smcr lg
      +---------------------------------------------------------------------------+
      |  REMOTE PEER ID  |     VLAN ID      |       ROLE       |      LG ADDR     |
      |------------------+------------------+------------------+------------------|
      | 6DD5E41D2D279651 | 0000000000000000 |      SERVER      | F1000500155AD000 |
      |------------------+------------------+------------------+------------------|
      | 6DD5E41D2D279651 | 0000000000000000 |      SERVER      | F1000500155B4000 |
      |------------------+------------------+------------------+------------------|
      | 6DD5E41D2D279651 | 0000000000000000 |      CLIENT      | F10005001117F000 |
      +---------------------------------------------------------------------------+
      
    2. Run the following commands to check the link group information:

      # kdb
      
       START              END <name>
       0000000000001000 00000000059E0000 start+000FD8
       F00000002FF47600 F00000002FFE1000 __ublock+000000
       000000002FF22FF4 000000002FF22FF8 environ+000000
       000000002FF22FF8 000000002FF22FFC errno+000000
       F100100A00000000 F100100A10000000 pvproc+000000
       F100100A10000000 F100100A18000000 pvthread+000000
       read vscsi_scsi_ptrs OK, ptr = 0xF1000000C0151E90
      
       (0)> smcr lg F1000500155AD000
       +-------------------------------------+
       |     LINK ADDR    |     LINK ID      |
       |------------------+------------------|
       | F100050011176C00 |         1        |
       |------------------+------------------|
       | F100050010E76800 |         2        |
       +-------------------------------------+
      

Measure workload performance

To demonstrate the performance benefits of SMC-R over traditional TCP/IP, we configured the Oracle Automated Stress Test (OAST) workload on an SMC-R capable logical partition (LPAR) using the steps described above and then measured the performance (in TPS) with and without enabling SMC-R. The results follow.

For the workload configured with User requested=150 and Runtime=4 hours, the key difference in performance (measured in TPS) is summarized in the graph in Figure 2.

Figure 2. Variation in average TPS with SMC-R enabled and disabled
fig3

Summary

This tutorial describes how to configure SMC-R and demonstrates how much performance improvement can be achieved using SMC-R with respect to the OAST workload. However, you can use the same setup as a baseline for improving the performance of your own critical AIX workloads running on IBM Power Systems servers with the latest RoCE adapters.

Sougata Sarkar
Sreevidhya Nair
Aparna Visweswaraiah