Integrated server-based input/output caching of SAN-based data

Introduction

IBM AIX 7.2 enables server-side caching of storage data. Cache devices may include server-attached flash storage, such as built-in solid-state drive (SSD) drives, flash devices directly attached through serial-attached SCSI (SAS) controllers, flash resources in the storage area network (SAN), non-volatile memory express devices, or virtual persistent memory volumes on AIX 7.3.

You can enable caching dynamically while the workload is running, meaning the workload does not need to be brought to a quiescent state to start caching. The caching process is completely transparent to the workload. After a target device is cached, all read requests are routed to the caching software. If the requested block is in the flash cache, the input/output (I/O) request is served from the cache device. If the block is not in the cache or if it is a write request, the request passes through to the original storage. This tutorial guides you how to implement and optimize server-side caching in IBM AIX 7.2, using SSD or flash storage to enhance performance.

Terminology

Here are the key terms used in this tutorial:

Cache device: An SSD or flash storage used for caching.
Cache pool: A group of cache devices utilized exclusively for storage caching.
Cache partition: A logical cache device created from the cache pool.
Target device: A storage device that is being cached. One or more target devices can be cached using a single cache partition.

Implementation details

The caching software comprises two main components:

Cache management: cache_mgt is a command available on AIX and Virtual I/O Server (VIOS) to create a cache pool, partition it into cache partitions, and assign them to workloads or AIX logical partitions (LPARs).
Cache engine: Cache engine is the core of the caching software, which decides which blocks in storage should be cached and retrieves data from the cache instead of the primary storage.

The cache management software includes a command, cache_mgt, that can be run on an AIX partition or VIOS. This command allows for creating a cache pool, partitioning the pool, assigning a cache partition directly as a target device, or provisioning it to the AIX LPAR as a virtual SCSI (VSCSI) or virtual flash device. The cache_mgt command is also used to start and stop caching on the AIX partition.

The caching algorithm uses a populate-on-read mechanism to aggressively fill the cache with data that has spatial locality (that is near other recently read blocks), particularly when the cache is cold. All blocks in the cache are monitored for their frequency of access, and a heat map is generated, considering both frequency and recency of access. After the cache is fully populated, new entries are added only if the new block is hotter than the coldest block in the cache. The coldest block is evicted, and the new entry is added. This aggressive population strategy ensures minimal warm-up time, making the cache effective immediately upon activation. The heat-map-based eviction policy ensures that caching remains dynamic and adapts to changing workload patterns over time.

Initial setup and configuration

AIX server-side flash caching supports several configurations that differ in how the cache device is provisioned to the AIX LPAR. The primary modes are dedicated, virtual, and N_Port ID Virtualization (NPIV) mode.

Dedicated mode

In the dedicated mode, the cache device is directly provisioned to the AIX LPAR. A cache pool is created on this device, allowing the creation of only one cache partition. This cache partition can be used to cache any number of target devices on the LPAR. Because the cache device is dedicated to this LPAR, the LPAR will not be mobile. If you need to migrate the LPAR to another server, you must manually stop caching and unconfigure the cache device before migration.

Sample setup for dedicated mode configuration:

The following figure shows a sample dedicated mode configuration, where hdisk1, hdisk2, and hdisk3 serve as the cache devices; hdisk4, hdisk5, hdisk6, and hdisk7 serve as the target devices

Figure 1. Dedicated mode configuration

Steps

Create a cache pool and a partition of 10 GB on the SSD storage using the following command:

# cache_mgt pool create -d hdisk1,hdisk2,hdisk3 -p pool1
# cache_mgt partition create -p pool1 -s 10G -P part1

Assign the partition to the target disks to be cached using the following command:

 # cache_mgt partition assign -t hdisk4 -P part1
 # cache_mgt partition assign -t hdisk5 -P part1
 # cache_mgt partition assign -t hdisk6 -P part1
 # cache_mgt partition assign -t hdisk7 -P part1

Start caching for the target devices using the following command:
```
 # cache_mgt cache start -t all
```
Monitor statistics on cache hits using the following command:
```
 # cache_mgt monitor get -h -s
```
Virtual mode

In the virtual mode, the cache device is assigned to the VIOS. The cache pool is created on the VIOS and split into several cache partitions. Each partition is assigned to a virtual host adapter. After the AIX LPAR discovers the device, the partition can be used to cache a target device. As the cache device is virtual, the partition can be migrated to another server. Before migration, caching is automatically stopped on the source system. During migration, a cache partition of the same size is created dynamically and made available on the target VIOS.

If the target VIOS has caching software and a cache pool available, and caching is automatically restarted on the destination. The cache will start in an empty or unpopulated state after migration.

Sample setup for virtual mode configuration:

The following figure shows a sample virtual mode configuration, where hdisk1, hdisk2, and hdisk3 are cache devices on the VIOS; hdisk4 and hdisk5 are target devices on AIX LPAR1; and hdisk6 and hdisk7 are target devices on AIX LPAR2.

Figure 2. Virtual mode configuration

Steps:

Perform the following steps on VIOS LPAR to setup a virtual mode configuration.

Create a cache pool and a partition of 10 GB on the SSD storage using the following commands:

 # cache_mgt pool create -d hdisk1,hdisk2,hdisk3 -p pool1
 # cache_mgt partition create -p pool1 -s 10G -P part1

Assign the partition to a virtual host adapter using the following command:
```
 # cache_mgt partition assign -P part1 -v vhost0
```

Perform the following steps on the AIX LPAR to setup a virtual mode configuration.

Assign the cache to the target devices of LPAR1 using the following command:

 # cache_mgt partition assign -t hdisk4 -P cachedisk0
 # cache_mgt partition assign -t hdisk5 -P cachedisk0

Start caching for the target devices using the following command:
```
 # cache_mgt cache start -t all
```
Monitor the statistics on cache hits using the following command:
```
 # cache_mgt monitor get -h -s
```

NPIV mode

In the NPIV mode, the cache device is available as a virtual fibre channel device on the AIX LPAR. A cache pool is created on the AIX LPAR, allowing for the creation of only one cache partition. This partition can cache multiple target devices on the LPAR. As the cache device is available from the SAN, the LPAR can be migrated to another server. The cache device must be visible on the target system, allowing caching to continue through the migration process with the cache remaining populated after migration.

Figure 3. NPIV mode configuration

Lifecycle management of cache devices

As needs evolve, the cache configuration may need adjustments. You can extend the cache pool with additional cache devices, create new cache partitions on existing pools, or resize existing partitions.

The following steps outline how to perform these adjustments.

Add a new cache device to the pool (overriding existing usage on hdisk4 if necessary) using the following command:
```
 # cache_mgt pool extend -p pool1 -d hdisk4 -f
```
Note: -f parameter over-rides any existing usage of disk.
Create a new partition for a workload of 100MB using the following command:
```
 # cache_mgt partition create -p pool1 -s 100M -P part2
```
Resize an existing partition to 120MB using the following command:
```
 # cache_mgt pool extend -p part1 -s 120M
```
High availability considerations

In a high availability cluster, if target devices are part of a resource group, proper failover orchestration is required. Caching must be activated on only one node at a time. Before initiating a failover, caching should be disabled on the original system. After failover to the alternate system is complete, caching can be manually enabled on the new system.

Perform the following steps to manage caching during a failover in a high availability cluster.

Stop caching on the original system using the following command:
```
 # cache_mgt cache stop -t hdisk2
```
Start caching on the new system after recovery using the following command:
```
 # cache_mgt cache start -t hdisk2
```

Benefits of server-side flash caching

Including a server-side flash cache offers several benefits. The following charts, derived from an internal benchmark, demonstrate how a server-side cache can increase virtualization density when the storage subsystem is the bottleneck.

Reduced query response time: SSD storage's lower latencies can significantly reduce transactional workload query response times. In an internal benchmark, transactional latency was reduced by more than half.
Improved throughput: SSD storage provides better throughput, resulting in higher transaction rates for OLTP workloads.

Workload transaction throughput

Offloaded SAN traffic: In congested SAN environments, flash cache can offload a significant percentage of read traffic, improving SAN write throughput and allowing more clients to be served.

LPAR disk throughput

Memory Footprint Optimization: Workloads may perform adequately with a reduced memory footprint if a flash cache is used, as the performance improvement from the cache can offset the effects of reduced memory.

Limitations

The following limitations apply when using the caching software.

The caching software operates as a read-only cache, meaning it only serves read requests from the flash SSD. All write requests are directed to the original storage device and are not automatically populated in the cache. If a write occurs to a block already in the cache, the cache data is invalidated, and the block will only be re-cached if its frequency and recency of access justify it.
The caching software manages metadata for each block read, requiring additional memory on every AIX LPAR. A minimum of 4 GB of memory is necessary for any LPAR with caching enabled.
Data is loaded into the cache based solely on local read patterns, and cache entries are invalidated locally. This limitation means that target devices cannot be shared by more than one LPAR concurrently. Therefore, target devices cannot be part of any clustered storage, such as Oracle Real Application Clusters (RAC), IBM Db2 pureScale, or IBM General Parallel File System (GPFS). Target devices within a high-availability cluster may only be cached if access is restricted to one host at a time, and caching is enabled only on the active node.
The cache disk can only be provisioned to one LPAR or one VIOS, with no possibility of sharing cache devices.
The caching software opens target devices to intercept I/O operations. If a workload attempts to open the device exclusively after caching has started, the exclusive open may fail. In such cases, caching may need to be stopped and restarted after the workload begins.

Summary

This tutorial outlined the process of implementing and optimizing server-side caching in IBM AIX 7.2. It covered how to set up, configure, and manage SSD or flash storage caching, including dedicated, virtual, and NPIV modes. The tutorial also addressed high availability considerations and provided practical steps for enhancing storage performance.

References

For more information, refer to the Storage data caching concept, IBM documentation.