The Swift High Latency Media project seeks to create a high-latency storage back end that makes it easier for users to perform bulk operations of data tiering within a Swift data ring.
In today’s world, data is produced at significantly higher rates than a decade ago—and the storage and data management solutions of the past can no longer keep up with the data demands of today. The policies and structures that decide and execute how that data is used, discarded, or retained is determines how efficiently the data is exploited. The need for intelligent data management and storage is more critical now than ever before.
Traditional management approaches hide cost-effective, high-latency media (HLM) storage, such as tape or optical disc archive back ends, underneath a traditional file system. The lack of HLM-aware file system interfaces and software makes it hard for users to understand and control data access on HLM storage. This, coupled with data- access latency, creates a bad user experience.
The Swift HLM Media project addresses this challenge. Running OpenStack Swift on top of HLM storage allows you to cheaply store and efficiently access large amounts of infrequently used object data. Data stored on tape storage can be easily adopted to an object storage data interface.
SwiftHLM can be added to OpenStack Swift (without modifying Swift itself) to extend Swift’s interface. This allows users to explicitly control and query the state (on disk or on HLM) of Swift object data, including efficient pre-fetch of bulk objects from HLM to disk when those objects need to be accessed. This function, previously missing in Swift, provides similar functions as Amazon Glacier does through the Glacier API or the Amazon S3 Lifecycle Management API.
BDT Tape Library Connector (open source) and IBM Spectrum Archive are examples of HLM back ends that provide important and complex functions to manage HLM resources (tape mounts/unmounts to drives, serialization of requests for tape media and tape drive resources). They can use SwiftHLM functions for a proper integration with Swift.
Access to data stored on HLM could be done transparently, without using SwiftHLM, but that does not work well in practice for many important use cases and for various other reasons. Learn more. SwiftHLM function can be orthogonal and complementary to Swift (ring to ring) tiering (source).
What should I contribute?
Through usage, reporting issues, and making code changes, you can help make SwiftHLM a key infrastructure to leverage this important area of data management.
An important objective is to create a developer community that provides better tools for accessing and using high latency data storage devices with OpenStack Swift. We can achieve more together than as individuals.
What technology problem will I help solve?
By contributing to the project, you will help create an efficient high-latency storage back end that provides a better experience for users who are performing bulk operations of data tiering – assigning different categories of data to different types of storage with the aim of reducing cost – within a Swift data ring. The SwiftHLM functions are orthogonal and complementary to ring-to-ring data tiering as described in the Swift data-tiering specification. See the SwiftHLM design discussion.
The problem with high latency media is that it does not work well when serving many independent requests — which is exactly the case with the workload from Swift proxy layer to Swift storage nodes.
Similar problems have been addressed in file systems, and the typical solution is to:
Use a low-capacity, low-latency storage tier (e.g. disk) on top of a large-capacity and high-latency (but typically cheap) storage tier (e.g. tape or optical disk)
Provide a function for explicit bulk operations for moving data between the tiers. Bulk operations allows for optimizing the order of requests and use of resources (e.g. tape mounting), which is crucial for making such a system usable. Bulk operations are essential in order to make the system usable. It is worth mentioning that Amazon Glacier also provides a bulk operation interface for pre-fetching objects before those can be accessed.
How can HLM help my business?
The technical problem of streamlining data tiering to higher latency storage devices translates into direct savings in resources (time and money) and provides a better user experience.
These types of solutions are becoming increasingly important as the rate of data grows exponentially every day. The ability to perform analytics on live and historical data is a sought-after feature that SwiftHLM delivers efficiently.