A tiered storage system provides lower total cost of ownership for large volumes of data by storing data on the most appropriate storage tier (flash, disk, or tape). Independent studies have demonstrated that the total cost of ownership with combined disk and tape solutions is 3 to 7 times lower than disk-only solutions [source]. In addition, the failure rate of tape is much lower than disk because the tapes do not contain continuously spinning mechanical parts.
While tape storage offers better total cost of ownership for storing large volumes of data for long periods of time, access time to data on tape is significantly higher than to data on disk. This can cause negative user experience which is amplified by standard file systems like NFS and SMB that are not tape aware.
Implementing tape awareness in standard file system protocols is not easy because of the wide distribution and acceptance within existing IT solutions. However, newer data interfaces such as object storage APIs can easily be adapted to the characteristics of tape [source]. Consequently, the integration of a tape-aware object storage API for data access in a tiered object storage system has the potential to significantly improve the user experience while lowering total cost of ownership with tapes. This will eventually drive the adoption of the tape tier for cloud storage.
A tiered storage system provides disk and tape storage within a global file system namespace and transparently moves data from disk to tape and vice versa. The file system namespace is accessible by users and applications through a file system protocol such as NFS, SMB and POSIX. The automatic movement of data is based on policies and takes into account retention times, sizes, data types and other data attributes. Access to data is transparent for the user and the application through the file system protocol, regardless whether the data is stored on disk or on tape.
A tiered storage system – especially the combination of disk and tape – provides lower total cost of ownership for large volumes of data by moving the data to the most appropriate storage technology [source]. The cost savings of tape storage in a tiered storage system come with a downside. The higher latency of data access on tape in combination with tape storage agnostic file systems cause negative user experiences, as demonstrated below. This challenge can be addressed with tape aware data interfaces provided by tiered storage systems.
File systems are a blessing and a curse for tiered storage
Tiered storage systems provide access to files via standard file system protocols such as POSIX, NFS, and SMB. However, the file system is both a blessing and a curse. The blessing is that the user sees all files regardless on which storage tier the files are stored. This blessing is also the curse, because the user cannot easily identify whether or not the file is migrated to tape. When the user opens a file that is migrated, he has to wait a couple of minutes until the tape is loaded, positioned, and the file has been recalled. Unfortunately, the user is not aware that the file open operation takes time because the file system has no way to let the user know. After exceeding the human acceptance factor (~20 seconds), the user may get impatient and attempt to cancel the operation or reboot the system.
It gets even worse if the user opens several migrated files simultaneously that happen to be stored on different tapes. This causes even longer wait times because tapes are randomly mounted and dismounted to serve individual files. Standard file systems cannot leverage tape optimized recalls where files are sorted by their tape-ID and location on tape before the recall is initiated in this order. This technique dramatically reduces the access and transfer time because tapes are loaded and read in an optimized manner.
The negative user experience described above is amplified by standard file systems that make the user think all files in a tiered storage system are instantly available but in reality access to files on tape takes time. Making standard file system protocols tape aware is not feasible because it requires changes in hundreds of heterogeneous applications, file system implementations and operating systems that are using standard file systems according to the current specification.
Tiered object storage improves the user experience
Object storage provides object APIs for accessing data as objects. Two widely accepted object storage APIs are OpenStack Swift and Amazon S3. Unlike standard file system protocols, object storage APIs can easily be made tape-aware. The OpenStack Swift High Latency Media (SwiftHLM) middleware [source] allows this by providing tape specific API functions that address the challenges associated with tape storage.
The SwiftHLM middleware is an OpenStack Swift associated project [source] and is useful for running OpenStack Swift with high latency media (HLM), such as tape storage. SwiftHLM can be added to OpenStack Swift and allows explicit control of Swift objects and container locations by providing tape specific functions for migration, recall and status. With the “status” call the user can easily determine whether or not objects have been migrated to tape. The user can initiate the migration of an object or container using the “migrate” call. When the user initiates a recall for a migrated object via the “recall” call he is aware that it takes a while because he knows the object is migrated. The HLM backend can also implement tape optimized recalls when one or more users recall multiple objects at the same time.
You can easily integrate OpenStack Swift with SwiftHLM into a tiered object storage system comprised of different storage tiers including flash, disk, and tape [source]. This object API allows the management of data on the storage tiers while the HLM backend adapts to the specific tape characteristics and functions. The object characteristics of files on tape – they are always written and read as a whole – perfectly match the tiered object storage characteristics.
Such a tiered object storage system is well suited for cloud deployments because it provides the cost advantages of a tiered storage system with a much improved user experience. It leverages standardized APIs such as OpenStack Swift in combination with automation of data management across the various storage tiers. Furthermore, the object storage architecture is designed for storing billions of objects for many users and application while providing continuous availability and extensive scalability.