Taxonomy Icon

Linux

Introduction

Bonding is the process of creating a single-bonded interface by combining two or more interfaces. Bonding results in a reliable connection of interfaces, which helps in fault tolerance and high availability. You can implement InfiniBand bonding on the host and guest systems using the physical and virtual functions of the SR-IOV Mellanox InfiniBand adapter. As a result of the benefits offered, the SR-IOV bond is useful in cloud environments. InfiniBand bonding supports only the active-backup mode, which we cite for your reference.

Learning objectives

This tutorial explains how to create different virtual functions from the physical functions of the InfiniBand adapter, and later how to pass through the virtual functions to the guests.

Figure 1 depicts the bonding setup on the host system using the physical functions, and bonding setup in the guest system using the virtual functions after the pass-through.

Figure 1. InfiniBand bonding using Mellanox dual-port adapter
InfiniBand bonding using Mellanox dual-port adapter

You will learn about bonding with SR-IOV Mellanox InfiniBand adapter and how to set up bonding on the host system and the KVM-based guest systems running the Red Hat Enterprise Linux (RHEL) operating system. Using SR-IOV bonding provides a reliable connection of interfaces, ensuring fault tolerance and high availability.

Prerequisites

Setting up bonding for SR-IOV involves the following perquisites:

  • Host system: A system with two dual-port Mellanox InfiniBand CX5 adapters.
  • Hardware: IBM® POWER9™ processor-based server running Linux on Power.
  • Operating system: RHEL7.5alternate (for host and guest operating systems) with KVM virtualization packages.
  • SR-IOV adapter: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] InfiniBand controller.
  • Software: Mellanox OFED software for RHEL 7.5alternate, to be installed on the host and guests. It is available at http://www.mellanox.com/page/products_dyn?product_family=26

    On the Download tab, select the current version, RHEL/CentOS, RHEL/CentOS, 7.5alternate, ppc64le, .iso file.

Estimated time

It will take approximately 1 to 2 hours to set up the environment and complete the bonding on the host and guest systems.

Steps

Identify the physical functions and set up bonding on the host system

  1. Run the lspci command to obtain information about the adapters on the host. In this example, the host system has two dual-port Mellanox adapters

     # lspci -nn | grep "Mellanox"   
     0000:01:00.0 Infiniband controller [0207]: Mellanox Technologies MT28800 Family
     [ConnectX-5 Ex] [15b3:1019]
     0000:01:00.1 Infiniband controller [0207]: Mellanox Technologies MT28800 Family
     [ConnectX-5 Ex] [15b3:1019]
    
     0030:01:00.0 Infiniband controller [0207]: Mellanox Technologies MT28800 Family
     [ConnectX-5 Ex] [15b3:1019]
     0030:01:00.1 Infiniband controller [0207]: Mellanox Technologies MT28800 Family
     [ConnectX-5 Ex] [15b3:1019]
    
  2. Create a bond on the host by using all four physical functions.

    1. Create a bond0 interface on the host using four physical functions: ib0, ib1, ib2, and ib3.

      # /etc/sysconfig/network-scripts/ifcfg-bond0 
      DEVICE=bond0
      IPADDR=100.1.1.20
      NETMASK=255.255.255.0
      USERCTL=no
      BOOTPROTO=none
      ONBOOT=no
      NM_CONTROLLED=yes
      BONDING_OPTS="mode=active-backup primary=ib0 miimon=100 updelay=100 downdelay=100"
      MTU=2044
      
    2. Update the interface’s configuration files on the host

      # /etc/sysconfig/network-scripts/ifcfg-ib0       
      TYPE=InfiniBand
      DEVICE=ib0
      NAME=ib0
      ONBOOT=yes
      NM_CONTROLLED=yes
      BOOTPROTO=none
      MASTER=bond0
      SLAVE=yes
      PRIMARY=yes
      
      # /etc/sysconfig/network-scripts/ifcfg-ib1 
      TYPE=InfiniBand
      DEVICE=ib1
      NAME=ib1
      ONBOOT=yes
      NM_CONTROLLED=yes
      BOOTPROTO=none
      MASTER=bond0
      SLAVE=yes
      PRIMARY=no
      
      # /etc/sysconfig/network-scripts/ifcfg-ib2
      
      TYPE=InfiniBand
      DEVICE=ib2
      NAME=ib2
      ONBOOT=yes
      NM_CONTROLLED=yes
      BOOTPROTO=none
      MASTER=bond0
      SLAVE=yes
      PRIMARY=no
      
      # /etc/sysconfig/network-scripts/ifcfg-ib3
      TYPE=InfiniBand
      DEVICE=ib3
      NAME=ib3
      ONBOOT=yes
      NM_CONTROLLED=yes
      BOOTPROTO=none
      MASTER=bond0
      SLAVE=yes
      PRIMARY=no
      
    3. Create the bond.conf file on the host. Configure the file with the following information:

       # cat /etc/modprobe.d/bond.conf    
       alias bond0 bonding
       options bond0 max_bonds=2 miimon=100 mode=1
      

      [mode = 1 indicates Active backup mode]

    4. Load the bond driver module from the command prompt by running the following command:

      modprobe bonding

    5. Restart the network on the host by running the following command:

      # service network restart
      Restarting network (via systemctl):         [  OK  ]
      
    6. Check if the bond is created on the host by running the ip link show and cat /proc/net/bonding/bond0 commands.

      # ip link show
      1: ib0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 2044 qdisc mq master bond0 state UP mode DEFAULT group default qlen 256 link/infiniband 00:00:05:34:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:29:c8 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
      2: ib1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 2044 qdisc mq master bond0 state UP mode DEFAULT group default qlen 256 link/infiniband 00:00:0d:67:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:29:c9 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
      3: ib2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 2044 qdisc mq master bond0 state UP mode DEFAULT group default qlen 256 link/infiniband 00:00:07:8d:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:29:a8 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
      4: ib3: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 2044 qdisc mq master bond0 state UP mode DEFAULT group default qlen 256 link/infiniband 00:00:0f:96:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:29:a9 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
      14: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 2044 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/infiniband 00:00:05:34:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:29:c8 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
      
      # cat /proc/net/bonding/bond0
      Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
      Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)
      Primary Slave: ib0 (primary_reselect always)
      Currently Active Slave: ib0
      MII Status: up
      MII Polling Interval (ms): 100
      Up Delay (ms): 100
      Down Delay (ms): 100
      
      Slave Interface: ib0
      MII Status: up
      Speed: 100000 Mbps
      Duplex: full
      Link Failure Count: 0
      Permanent HW addr: 00:00:04:d0:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:29:c8
      Slave queue ID: 0
      
      Slave Interface: ib1
      MII Status: up
      Speed: 100000 Mbps
      Duplex: full
      Link Failure Count: 0
      Permanent HW addr: 00:00:0d:2d:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:29:c9
      Slave queue ID: 0
      
      Slave Interface: ib2
      MII Status: up
      Speed: 100000 Mbps
      Duplex: full
      Link Failure Count: 0
      Permanent HW addr: 00:00:07:5a:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:29:a8
      Slave queue ID: 0
      
      Slave Interface: ib3
      MII Status: up
      Speed: 100000 Mbps
      Duplex: full
      Link Failure Count: 0
      Permanent HW addr: 00:00:0f:8d:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:29:a9
      Slave queue ID: 0
      
    7. Validate whether the active-backup mode works correctly by running the following commands:

      # ifconfig ib0 down
      bond0: making interface ib1 the new active one
      
      # cat /proc/net/bonding/bond0
      Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
      Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)
      Primary Slave: ib0 (primary_reselect always)
      Currently Active Slave: ib1  ------------------------------[ ib1 is the active slave]
      MII Status: up
      MII Polling Interval (ms): 100
      Up Delay (ms): 100
      Down Delay (ms): 100
      
      Slave Interface: ib0
      MII Status: down
      Speed: 100000 Mbps
      Duplex: full
      Link Failure Count: 1
      Permanent HW addr: 00:00:f8:7f:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:29:a8
      Slave queue ID: 0
      
      Slave Interface: ib1
      MII Status: up
      Speed: 100000 Mbps
      Duplex: full
      Link Failure Count: 0
      Permanent HW addr: 00:01:00:64:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:29:a9
      Slave queue ID: 0
      
      Slave Interface: ib2
      MII Status: up
      Speed: 100000 Mbps
      Duplex: full
      Link Failure Count: 0
      Permanent HW addr: 00:00:07:5a:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:29:a8
      Slave queue ID: 0
      
      Slave Interface: ib3
      MII Status: up
      Speed: 100000 Mbps
      Duplex: full
      Link Failure Count: 0
      Permanent HW addr: 00:00:0f:8d:fe:80:00:00:00:00:00:00:24:8a:07:03:00:a3:29:a9
      Slave queue ID: 0
      

Create virtual functions of a single physical function and set up bonding in the first guest

This scenario is particularly useful in cases where a virtual function is hot unplugged from the running guest (here, the bond interface would still be in active state with primary slave switching).

  1. Create three virtual functions on the first dual-port physical function on the host.

    # echo 3> /sys/class/infiniband/mlx5_0/device/sriov_numvfs

    When you run this command, three virtual functions are created, as shown below:

     # lspci -nn | grep "Virtual Function"  
     0000:01:00.2 Infiniband controller [0207]: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function] [15b3:101a]
     0000:01:00.3 Infiniband controller [0207]: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function] [15b3:101a]
     0000:01:00.4 Infiniband controller [0207]: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function] [15b3:101a]
    

    Note: To pass through the virtual function with PCI ID: 0000:01:00.2 from the above output, use the following syntax in the guest XML:

    <address domain='0x0000' bus='0x01' slot='0x00' function='0x2'/>

  2. Pass through all the virtual functions of the single physical function to the first guest and create the bond using the active-backup mode.

    The active-backup mode is primarily used for the fault tolerance feature. At any given time, only one slave is active in this mode, and the other slave functions when the primary slave fails.

To demonstrate the active-backup mode, following these steps:

  1. Pass through all the virtual functions to the first guest system.

    1. Edit the guest XML file and add the following entries:

      <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
      <address domain='0x0000' bus='0x01' slot='0x00' function='0x2'/>
      </source>
      </hostdev>
      <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
      <address domain='0x0000' bus='0x01' slot='0x00' function='0x3'/>
      </source>
      </hostdev>
      <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
      <address domain='0x0000' bus='0x01' slot='0x00' function='0x4'/>
      </source>
      </hostdev>
      
    2. Start the guest system by running the following command:

      # virsh start rhel7.5alt_vm   
      Domain rhel7.5alt_vm started
      
  2. After the guest system is active, create the bond inside the guest system by updating the configuration files.

    # /etc/sysconfig/network-scripts/ifcfg-bond1
    DEVICE=bond1
    IPADDR=100.1.1.21
    NETMASK=255.255.255.0
    USERCTL=no
    BOOTPROTO=none
    ONBOOT=no
    NM_CONTROLLED=yes
    BONDING_OPTS="mode=active-backup primary=ib0 miimon=100 updelay=100 downdelay=100&
    MTU=2044
    
    # /etc/sysconfig/network-scripts/ifcfg-ib0
    TYPE=InfiniBand
    DEVICE=ib0
    NAME=ib0
    ONBOOT=yes
    NM_CONTROLLED=yes
    BOOTPROTO=none
    MASTER=bond1
    SLAVE=yes
    PRIMARY=yes
    
    # /etc/sysconfig/network-scripts/ifcfg-ib1
    TYPE=InfiniBand
    DEVICE=ib1
    NAME=ib1
    ONBOOT=yes
    NM_CONTROLLED=yes
    BOOTPROTO=none
    MASTER=bond1
    SLAVE=yes
    PRIMARY=no
    
    # /etc/sysconfig/network-scripts/ifcfg-ib2
    TYPE=InfiniBand
    DEVICE=ib2
    NAME=ib2
    ONBOOT=yes
    NM_CONTROLLED=yes
    BOOTPROTO=none
    MASTER=bond1
    SLAVE=yes
    PRIMARY=no
    
  3. Create the bond.conf file on the guest system and configure the file as follows:

    # cat /etc/modprobe.d/bond.conf
    alias bond1 bonding
    options bond1 max_bonds=2 miimon=100 mode=1
    
    [mode = 1 indicates the active backup mode]
    
  4. Restart the network by running the following command:

    # service network restart
    Restarting network (via systemctl):   [  OK  ]
    
  5. Check if bonding is created on the guest system by running the following commands:

    # ip link show  
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000     link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000     link/ether 52:54:00:31:dd:da brd ff:ff:ff:ff:ff:ff
    3: bond1: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 2044 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/infiniband 00:00:04:a6:fe:80:00:00:00:00:00:00:11:22:33:44:55:66:10:00 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    4: ib0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 2044 qdisc mq master bond1 state UP mode DEFAULT group default qlen 256 link/infiniband 00:00:04:a6:fe:80:00:00:00:00:00:00:11:22:33:44:55:66:10:00 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    5: ib1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 2044 qdisc mq master bond1 state UP mode DEFAULT group default qlen 256 link/infiniband 00:00:0c:a6:fe:80:00:00:00:00:00:00:11:22:33:44:55:66:01:00 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    6: ib2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 2044 qdisc mq master bond1 state UP mode DEFAULT group default qlen 256 link/infiniband 00:00:02:a6:fe:80:00:00:00:00:00:00:11:22:33:44:55:66:39:00 brd 0:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    
    # cat /proc/net/bonding/bond1
    Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
    
    Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)
    Primary Slave: ib0 (primary_reselect always)
    Currently Active Slave: ib0
    MII Status: up
    MII Polling Interval (ms): 100
    Up Delay (ms): 100
    Down Delay (ms): 100
    
    Slave Interface: ib0
    MII Status: up
    Speed: 100000 Mbps
    Duplex: full
    Link Failure Count: 0
    Permanent HW addr: 00:00:04:a6:fe:80:00:00:00:00:00:00:11:22:33:44:55:66:0d:00
    Slave queue ID: 0
    
    Slave Interface: ib1
    MII Status: up
    Speed: 100000 Mbps
    Duplex: full
    Link Failure Count: 0
    Permanent HW addr: 00:00:05:a6:fe:80:00:00:00:00:00:00:11:22:33:44:55:66:5a:00
    Slave queue ID: 0
    
    Slave Interface: ib2
    MII Status: up
    Speed: 100000 Mbps
    Duplex: full
    Link Failure Count: 0
    Permanent HW addr: 00:00:02:a6:fe:80:00:00:00:00:00:00:11:22:33:44:55:66:3d:00
    Slave queue ID: 0
    
  6. Conduct a ping test between the bond interface of the guest system and the host system. From the guest system to the host system, run the following command:

    # ping -I bond1 100.1.1.20 -c 5
    PING 100.1.1.20 (100.1.1.20) from 100.1.1.21 bond1: 56(84) bytes of data.
    64 bytes from 100.1.1.20: icmp_seq=1 ttl=64 time=0.099 ms
    64 bytes from 100.1.1.20: icmp_seq=2 ttl=64 time=0.076 ms
    64 bytes from 100.1.1.20: icmp_seq=3 ttl=64 time=0.068 ms
    64 bytes from 100.1.1.20: icmp_seq=4 ttl=64 time=0.066 ms
    64 bytes from 100.1.1.20: icmp_seq=5 ttl=64 time=0.068 ms
    
    --- 100.1.1.20 ping statistics ---
    5 packets transmitted, 5 received, 0% packet loss, time 4149ms
    rtt min/avg/max/mdev = 0.066/0.075/0.099/0.014 ms
    
  7. Run the following commands to validate if the active-backup mode works on the guest system:

    # ifconfig ib0 down
    bond1: making interface ib1 the new active one
    
    # cat /proc/net/bonding/bond1
    Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
    
    Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)
    Primary Slave: ib0 (primary_reselect always)
    Currently Active Slave: ib1
    MII Status: up
    MII Polling Interval (ms): 100
    Up Delay (ms): 100
    Down Delay (ms): 100
    
    Slave Interface: ib0
    MII Status: down
    Speed: 100000 Mbps
    Duplex: full
    Link Failure Count: 1
    Permanent HW addr: 00:00:04:a6:fe:80:00:00:00:00:00:00:11:22:33:44:55:66:0d:00
    Slave queue ID: 0
    
    Slave Interface: ib1
    MII Status: up
    Speed: 100000 Mbps
    Duplex: full
    Link Failure Count: 0
    Permanent HW addr: 00:00:05:a6:fe:80:00:00:00:00:00:00:11:22:33:44:55:66:5a:00
    Slave queue ID: 0
    
    slave Interface: ib2
    MII Status: up
    Speed: 100000 Mbps
    Duplex: full
    Link Failure Count: 0
    Permanent HW addr: 00:00:02:a6:fe:80:00:00:00:00:00:00:11:22:33:44:55:66:3d:00
    Slave queue ID: 0
    
  8. Conduct a ping test between the bond interface of the host system and the guest system, after bringing down the primary interface.

    From the host system to the guest system:

    # ping -I bond0 100.1.1.21 -c 5
    PING 100.1.1.22 (100.1.1.21) from 100.1.1.20 bond0: 56(84) bytes of data.
    64 bytes from 100.1.1.21: icmp_seq=1 ttl=64 time=0.091 ms
    64 bytes from 100.1.1.21: icmp_seq=2 ttl=64 time=0.074 ms
    64 bytes from 100.1.1.21: icmp_seq=3 ttl=64 time=0.063 ms
    64 bytes from 100.1.1.21: icmp_seq=4 ttl=64 time=0.061 ms
    64 bytes from 100.1.1.21: icmp_seq=5 ttl=64 time=0.066 ms
    --- 100.1.1.22 ping statistics ---
    5 packets transmitted, 5 received, 0% packet loss, time 4179ms
    rtt min/avg/max/mdev = 0.061/0.071/0.091/0.010 ms
    

    After bringing down the primary interface in the guest, no impact was noticed because the primary slave switching was successful.

Create virtual functions of the different physical functions and set up bonding in the second guest

This scenario is useful in cases where a particular physical function goes down in the host. There would be no impact on the bond inside the running guest because the primary slave switching happens for the other virtual functions created from the different physical functions.

  1. Create one virtual function on each of the remaining physical functions on the host system.

    # echo 1 > /sys/class/infiniband/mlx5_1/device/sriov_numvfs
    # echo 1 > /sys/class/infiniband/mlx5_2/device/sriov_numvfs
    # echo 1 > /sys/class/infiniband/mlx5_3/device/sriov_numvfs
    

    The following virtual functions are newly created:

    0000:01:01.2 Infiniband controller [0207]: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function] [15b3:101a]
    0030:01:00.2 Infiniband controller [0207]: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function] [15b3:101a]
    0030:01:01.2 Infiniband controller [0207]: Mellanox Technologies MT28800 Family [ConnectX-5 Ex Virtual Function] [15b3:101a]
    
  2. Pass through the above virtual functions of different physical functions to the second guest system and create the bond.

    1. Edit the guest XML file and add the following entries:

      <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
      <address domain='0x0000' bus='0x01' slot='0x01' function='0x2'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0'/>
      </hostdev>
      <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
      <address domain='0x0030' bus='0x01' slot='0x00' function='0x2'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/>
      </hostdev>
      <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
      <address domain='0x0030' bus='0x01' slot='0x01' function='0x2'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x01' function='0x0'/>
      </hostdev>
      
    2. Start the guest system by running the following command:

      # virsh start rhel7.5alt_vm1
      Domain rhel7.5alt_vm1 started
      
    3. After the guest system is active, create the bond inside the guest system by updating the configuration files.

      # /etc/sysconfig/network-scripts/ifcfg-bond2
      DEVICE=bond2
      IPADDR=100.1.1.22
      NETMASK=255.255.255.0
      USERCTL=no
      BOOTPROTO=none
      ONBOOT=yes
      NM_CONTROLLED=yes
      BONDING_OPTS="mode=active-backup primary=ib0 miimon=100 updelay=100 downdelay=100"
      MTU=2044
      
      # cat /etc/sysconfig/network-scripts/ifcfg-ib0
      TYPE=InfiniBand
      DEVICE=ib0
      NAME=ib0
      ONBOOT=yes
      NM_CONTROLLED=yes
      BOOTPROTO=none
      MASTER=bond2
      SLAVE=yes
      PRIMARY=yes
      
      # cat /etc/sysconfig/network-scripts/ifcfg-ib1
      TYPE=InfiniBand
      DEVICE=ib1
      NAME=ib1
      ONBOOT=yes
      NM_CONTROLLED=yes
      BOOTPROTO=none
      MASTER=bond2
      SLAVE=yes
      PRIMARY=no
      
      # cat /etc/sysconfig/network-scripts/ifcfg-ib2
      TYPE=InfiniBand
      DEVICE=ib2
      NAME=ib2
      ONBOOT=yes
      NM_CONTROLLED=yes
      BOOTPROTO=none
      MASTER=bond2
      SLAVE=yes
      PRIMARY=no
      
    4. Create the bond.conf file on the guest system and configure the file as follows:

      # cat /etc/modprobe.d/bond.conf
      alias bond2 bonding
      options bond2 miimon=100 mode=0
      
    5. Restart the network.

      # service network restart
      Restarting network (via systemctl):  [  OK  ]
      
    6. Check if the bond is created on the guest system by running the following commands:

      # ip link show
      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 link/ether 52:54:00:31:dd:da brd ff:ff:ff:ff:ff:ff
      3: bond2: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 2044 qdisc noqueue state UP mode DEFAULT group default qlen 1000     link/infiniband 00:00:04:a6:fe:80:00:00:00:00:00:00:11:22:33:44:55:66:10:00 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
      4: ib0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 2044 qdisc mq master bond2 state UP mode DEFAULT group default qlen 256 link/infiniband 00:00:04:a6:fe:80:00:00:00:00:00:00:11:22:33:44:55:66:10:00 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
      5: ib1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 2044 qdisc mq master bond2 state UP mode DEFAULT group default qlen 256 link/infiniband 00:00:0c:a6:fe:80:00:00:00:00:00:00:11:22:33:44:55:66:01:00 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
      6: ib2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 2044 qdisc mq master bond2 state UP mode DEFAULT group default qlen 256 link/infiniband 00:00:02:a6:fe:80:00:00:00:00:00:00:11:22:33:44:55:66:39:00 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
      
      # cat /proc/net/bonding/bond2
      Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
      
      Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)
      Primary Slave: ib0 (primary_reselect always)
      Currently Active Slave: ib0
      MII Status: up
      MII Polling Interval (ms): 100
      Up Delay (ms): 100
      Down Delay (ms): 100
      
      Slave Interface: ib0
      MII Status: up
      Speed: 100000 Mbps
      Duplex: full
      Link Failure Count: 0
      Permanent HW addr: 00:00:04:a6:fe:80:00:00:00:00:00:00:11:22:33:44:55:66:10:00
      Slave queue ID: 0
      
      Slave Interface: ib1
      MII Status: up
      Speed: 100000 Mbps
      Duplex: full
      Link Failure Count: 0
      Permanent HW addr: 00:00:0c:a6:fe:80:00:00:00:00:00:00:11:22:33:44:55:66:23:00
      Slave queue ID: 0
      
      Slave Interface: ib2
      MII Status: up
      Speed: 100000 Mbps
      Duplex: full
      Link Failure Count: 0
      Permanent HW addr: 00:00:05:a6:fe:80:00:00:00:00:00:00:11:22:33:44:55:66:63:00
      Slave queue ID: 0
      
    7. Conduct a ping test between the bond interfaces of second guest system and the host system.

      From the guest system to the host system:

      # ping -I bond2 100.1.1.20 -c 5
      PING 100.1.1.20 (100.1.1.20) from 100.1.1.22 bond2: 56(84) bytes of data.
      64 bytes from 100.1.1.20: icmp_seq=1 ttl=64 time=0.588 ms
      64 bytes from 100.1.1.20: icmp_seq=2 ttl=64 time=0.061 ms
      64 bytes from 100.1.1.20: icmp_seq=3 ttl=64 time=0.058 ms
      64 bytes from 100.1.1.20: icmp_seq=4 ttl=64 time=0.049 ms
      64 bytes from 100.1.1.20: icmp_seq=5 ttl=64 time=0.051 ms
      --- 100.1.1.20 ping statistics  ---
      5 packets transmitted, 5 received, 0% packet loss, time 4143ms
      rtt min/avg/max/mdev = 0.049/0.161/0.588/0.213 ms
      

      From the host system to the guest system:

      # ping -I bond0 100.1.1.22 -c 5
      PING 100.1.1.22 (100.1.1.22) from 100.1.1.20 bond0: 56(84) bytes of data.
      64 bytes from 100.1.1.22: icmp_seq=1 ttl=64 time=0.130 ms
      64 bytes from 100.1.1.22: icmp_seq=2 ttl=64 time=0.066 ms
      64 bytes from 100.1.1.22: icmp_seq=3 ttl=64 time=0.056 ms
      64 bytes from 100.1.1.22: icmp_seq=4 ttl=64 time=0.056 ms
      64 bytes from 100.1.1.22: icmp_seq=5 ttl=64 time=0.053 ms
      
      --- 100.1.1.22 ping statistics ---
      5 packets transmitted, 5 received, 0% packet loss, time 4171ms
      rtt min/avg/max/mdev = 0.053/0.072/0.130/0.029 ms
      
    8. Run the following commands to validate if the active-backup mode works in the second guest system:

      # ifconfig ib0 down
      bond2: making interface ib1 the new active one
      
      # cat /proc/net/bonding/bond2
      Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
      
      Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)
      Primary Slave: ib0 (primary_reselect always)
      Currently Active Slave: ib1
      MII Status: up
      MII Polling Interval (ms): 100
      Up Delay (ms): 100
      Down Delay (ms): 100
      
      Slave Interface: ib0
      MII Status: down
      Speed: 100000 Mbps
      Duplex: full
      Link Failure Count: 1
      Permanent HW addr: 00:00:04:a6:fe:80:00:00:00:00:00:00:11:22:33:44:55:66:10:00
      Slave queue ID: 0
      
      Slave Interface: ib1
      MII Status: up
      Speed: 100000 Mbps
      Duplex: full
      Link Failure Count: 0
      Permanent HW addr: 00:00:0c:a6:fe:80:00:00:00:00:00:00:11:22:33:44:55:66:23:00
      Slave queue ID: 0
      
      Slave Interface: ib2
      MII Status: up
      Speed: 100000 Mbps
      Duplex: full
      Link Failure Count: 0
      Permanent HW addr: 00:00:05:a6:fe:80:00:00:00:00:00:00:11:22:33:44:55:66:63:00
      Slave queue ID: 0
      
    9. Conduct a ping test between the bond interface of the first guest and the second guest system, after bringing down the primary interface.

      From the second guest to the first guest:

      # ping -I bond2 100.1.1.21 -c 5
      PING 100.1.1.21 (100.1.1.21) from 100.1.1.22 bond2: 56(84) bytes of data.
      64 bytes from 100.1.1.21: icmp_seq=1 ttl=64 time=0.843 ms
      64 bytes from 100.1.1.21: icmp_seq=2 ttl=64 time=0.082 ms
      64 bytes from 100.1.1.21: icmp_seq=3 ttl=64 time=0.062 ms
      64 bytes from 100.1.1.21: icmp_seq=4 ttl=64 time=0.066 ms
      64 bytes from 100.1.1.21: icmp_seq=5 ttl=64 time=0.060 ms
      
      --- 100.1.1.21 ping statistics ---
      5 packets transmitted, 5 received, 0% packet loss, time 4138ms
      rtt min/avg/max/mdev = 0.060/0.222/0.843/0.310 ms
      

      From the first guest to the second guest:

      # ping -I bond1 100.1.1.22 -c 5
      PING 100.1.1.22 (100.1.1.22) from 100.1.1.21 bond1: 56(84) bytes of data.
      64 bytes from 100.1.1.22: icmp_seq=1 ttl=64 time=0.100 ms
      64 bytes from 100.1.1.22: icmp_seq=2 ttl=64 time=0.056 ms
      64 bytes from 100.1.1.22: icmp_seq=3 ttl=64 time=0.059 ms
      64 bytes from 100.1.1.22: icmp_seq=4 ttl=64 time=0.057 ms
      64 bytes from 100.1.1.22: icmp_seq=5 ttl=64 time=0.055 ms
      
      --- 100.1.1.22 ping statistics ---
      5 packets transmitted, 5 received, 0% packet loss, time 4185ms
      rtt min/avg/max/mdev = 0.055/0.065/0.100/0.018 ms
      

Notes:

  • The active-backup bonding mode is the only bonding mode supported for SR-IOV InfiniBand.
  • The test system that is used for reference focusses on only the RHEL operating system. The steps and output might vary for other distributions.
  • Mellanox OFED should be installed on the host and guest systems as specified in the prerequisite.

Summary

The steps described in this tutorial can help you recreate various combinations of bonding interfaces on the host and guest systems. Get started on a host with a SR-IOV capable adapter, identify the physical functions, and create the virtual functions on the host system. Eventually, you can achieve the goal of creating the bond on both host and guest systems leading to a fault tolerant environment that is essential for high availability.