This document describes the compatibility and working of all storage hardware and KVM-QEMU with the IBM Power Systems LC921 and LC922 POWER9 systems. The aim is to provide you with the information gathered during testing and development of the systems.

This document provides information about QEMU pass-through modes, including available cache modes. A list of all devices that are tested with Small Computer System Interface (SCSI) pass-through is also included. This document also covers a workaround for a known SCSI issue with SCSI pass-through that occurs in some devices.

Types of SCSI pass-through supported by KVM-QEMU

QEMU supports two main pass-through modes for SCSI storage. The disk device type described here as emulated pass-through is the traditional mode. QEMU emulates all the SCSI protocol internally, including the device capabilities, and uses the real storage device as a block device just to read and write data. This method was deprecated by the SCSI pass-through, the logical unit number (LUN) device type. In this mode, QEMU proxies the SCSI communication back and forth between the guest device and the physical device, allowing the guest device to use all advanced features that the real device might implement.

To assign a storage device in SCSI pass-through mode, you can do one of the following:

Cache modes

Though there are several cache modes available, when you use the cache=none mode, all I/O cache is done inside the guest operation system while bypassing the cache from the hypervisor to the storage. This mode also allows live migration of the guest to other hosts.

List of supported devices with SCSI pass-through in KVM-QEMU and IBM POWER LC922/LC921

Adapter Description
AOC-QLE2692OP-IB001 Qlogic 2-port 16 Gb Fibre Channel adapter
AOC-LPE16002B-M6-O Emulex 2-port 16 Gb Fibre Channel adapter
QLE2742 Qlogic 2-port 32 Gb Fibre Channel adapter
AOC-K-9361-8IS-IB001 LSI 9361 8-port PCIe SAS-3 RAID controller (internal)
AOC-K-9361-8I2S-IB001 LSI 9361 8-port PCIe SAS-3 RAID controller 2GB cache (internal)

Note: This needs the SCSI max_sectors_kb workaround.

AOC-K-S3008L-L8iS-IB001 LSI 3008 8-port SAS RAID controller (internal)
AOC-SAS3-9300-8E LSI 3008 8-port SAS controller (external)
AOC-SAS3-9305-16E LSI 12 Gb SAS 9305-16E JBOD HBA
Planar SATA Microsemi PM8069 SATA RAID Controller

Troubleshooting SCSI sense error

Description

Some storages devices (such as the AOC-K-9361-8I2S-IB001) do not behave properly when using SCSI pass-through. The issue is because of a mismatch between the max_sectors_kb parameter in the guest and host operating system. When the guest operating system assumes a value that is greater than the value in the host operating system, SCSI sense errors are caused for write operations in the device.

Cause

The max_sectors_kb parameter indicates the largest I/O size that the driver sends to the block device. Sending an I/O request with a value that is greater that what the device supports causes errors. When you use KVM-QEMU with SCSI pass-through, the maximum I/O size is limited by the current max_sectors_kb parameter value that is set in the hypervisor. In this example, assume that a sdb device in the host has a value of 1024 kb for the max_sector_kb parameter:

$ cat /sys/block/sdb/queue/max_sectors_kb
1024

If this device is used in a guest with KVM-QEMU by using SCSI pass-through, the guest operation system must set the value of the max_sectors_kb parameter in the pass-through device to 1024 too.

When running under Linux, QEMU provides the value to use for the max_sectors_kb parameter to the guest. QEMU retrieves the current host value of the max_sectors_kb parameter in the device that is using the BLKSECTGET ioctl system call.

After that, QEMU uses the SCSI communication between the guest kernel and the physical device to patch in the retrieved value back to the guest device. This is done by intercepting and modifying the device answer to a Block Limits vital product data (VPD) request. This request contains the hardware limits for the data transfer. When a QEMU guest kernel queries the information for a SCSI pass-through device, the answer is changed to match the value of the max_sectors_kb parameter that is retrieved earlier from the host.

The process of solving this issue is described here. To pass-through a physical device sdb from the host operating system to a guest operating system, retrieve The max_sectors_kb parameter of this device by running the following command:

[host]$ cat /sys/block/sdb/queue/max_sectors_kb
1024

By using the sg3_utils package, SCSI messages are sent from the user space to the physical device, where the messages are interpreted as commands and the hardware capabilities are fetched. In this case, send a request for a Block Limits VPD page from the host:

[host]$ sg_vpd --page=bl /dev/sdb
Block limits VPD page (SBC):
    Write same non-zero (WSNZ): 1
    Maximum compare and write length: 32 blocks
    Optimal transfer length granularity: 64 blocks
    Maximum transfer length: 2097152 blocks
    Optimal transfer length: 0 blocks
    Maximum prefetch length: 0 blocks
    Maximum unmap LBA count: 524288
    Maximum unmap block descriptor count: 1
    Optimal unmap granularity: 16
    Unmap granularity alignment valid: 1
    Unmap granularity alignment: 0
    Maximum write same length: 0x80000 blocks
    Maximum atomic transfer length: 0
    Atomic alignment: 0
    Atomic transfer length granularity: 0
    Maximum atomic transfer length with atomic boundary: 0
    Maximum atomic boundary size: 0

In these properties, the Maximum transfer length and Optimal transfer length fields determine the maximum values that the kernel driver can assign to the max_sectors_kb parameter of the device.

QEMU takes advantage of this kernel behavior to set the max_sectors_kb parameter of the guest operating system to match the value in the host operating system. This value can be retrieved by using the BLKSECTGET system call. In this case, QEMU retrieves the value 1024 and stores it.

When the guest kernel is starting, it retrieves the information about the Block Limit page back to the physical device. The results match the output of the sg_vpd command described previously. However, before the result is provided to the guest kernel, QEMU patches in the value of maximum transfer length and optimal transfer length. This makes the guest kernel receive the information that the maximum value is equal to the value of the max_sectors_kb parameter retrieved from the host.

The following output is displayed when you run the sg_vpd command in the guest operating system:

[guest]$ sg_vpd --page=bl /dev/sdb
Block limits VPD page (SBC):
    Write same non-zero (WSNZ): 1
    Maximum compare and write length: 32 blocks
    Optimal transfer length granularity: 64 blocks
    Maximum transfer length: 2048 blocks
    Optimal transfer length: 2048 blocks
    Maximum prefetch length: 0 blocks
    Maximum unmap LBA count: 524288
    Maximum unmap block descriptor count: 1
    Optimal unmap granularity: 16
    Unmap granularity alignment valid: 1
    Unmap granularity alignment: 0
    Maximum write same length: 0x80000 blocks
    Maximum atomic transfer length: 0
    Atomic alignment: 0
    Atomic transfer length granularity: 0
    Maximum atomic transfer length with atomic boundary: 0
    Maximum atomic boundary size: 0

The value of the Maximum transfer length and Optimal transfer length fields is 2048 blocks. A block is a logical unit of 512 bytes. Hence, a value of 2048 blocks is equal to 1024 kb. Thus, the guest kernel sets the value of the max_sectors_kb parameter of this device to 1024 kb:

[guest]$ cat /sys/block/sdb/queue/max_sectors_kb
1024

The issue arises when the physical device does not support the Block Limits VPD request. This is not a mandatory message in the SCSI standard. Thus, it is optional for the hardware to support it. In such cases, the output of the sg_vpd command is similar to the following:

[host]$ sg_vpd --page=bl  /dev/sdc
VPD page=0xb0
fetching VPD page failed

If you create a guest kernel by using /dev/sdc as the pass-through, the guest kernel does not issue the Block Limits VPD request during its boot process. A default value to the max_sectors_kb parameter is assumed that might differ from the value that is used in the host operating system, which might cause SCSI sense errors in the guest device. For example, an error occurs when the guest operating system takes a greater value of the max_sectors_kb parameter than what is set in the host operating system, making SCSI requisitions larger than what the host operating system can process.

This means that any SCSI device that does not support the Block Limits VPD page can be affected by this issue when used in pass-through mode in QEMU.

Solution

To work around the SCSI sense error, set the max_sectors_kb parameter in the guest operating system to match the value that the device has on the host operating system. You can perform one of the following actions to solve this issue:

  • Run the echo command:

    Set the value in the /sys/block//queue/max_sectors_kb directory. If the max_sectors_kb parameter in the host operating system is 256, set it to the same value in the guest operating system:

    $ echo 256 > /sys/block//queue/max_sectors_kb

    This process can be automated to persist guest restart. An alternative method is to add the echo command in the /etc/rc.local file in the guest operating system. In this example, this is done for a /dev/sda SCSI device in the guest operating system:

    # cat /etc/rc.local (...) echo 256 > /sys/block/sda/queue/max_sectors_kb

    You must make the /etc/rc.local executable file by running the chmod +x /etc/rc.local command and then the value is echoed during boot.

    Note: You can use udev rules to achieve this. However, adding the echo command in the /etc/rc.local file is an alternative method.

  • Set the value in libvirt:

    You can set the value of the max_sectors_kb parameter directly in the libvirt XML file, forcing the whole SCSI bus to not surpass the value you want:

    <controller type='scsi' index='0' model='virtio-scsi'> <driver max_sectors='512'/> <----------- "512" if you want to set max_sectors_kb to 256 <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </controller>

    Note: This change impacts all the SCSI devices that use this controller.

    You can use this approach when you want to install a new operating system that uses a SCSI pass-through disk that is affected by this issue. To echo the right value to the /sys/block//queue/max_sectors_kb directory during a guest install operation, you must access a system terminal during the installation process and change the value of the max_sectors_kb parameter before the installation starts to write in the disk. Hence, you can set the value in libvirt. If the guest operating system is already installed, the approach described in the first method is less restrictive because it does not affect other devices.

Join The Discussion

Your email address will not be published. Required fields are marked *