The Mendacious Magic of Cloud Tiering
In the words of the great Arthur C. Clarke, “Any sufficiently advanced technology is indistinguishable from magic.” This is a brilliant maxim, but it doesn’t mean that a solution which appears to be magical must be backed by sufficiently advanced technology. Instead, the inverse is often true, and companies develop simple, appealing, even magical taglines to mask the underlying flaws and complexity of their technology.
July 26, 2022
In the words of the great Arthur C. Clarke, “Any sufficiently advanced technology is indistinguishable from magic.” This is a brilliant maxim, but it doesn’t mean that a solution which appears to be magical must be backed by sufficiently advanced technology. Instead, the inverse is often true, and companies develop simple, appealing, even magical taglines to mask the underlying flaws and complexity of their technology.
Cloud tiering, if not the worst offender, is certainly high on the list. All the traditional storage vendors offer some variation of cloud tiering. Do not mistake them for cloud innovators, though. They are trying to remain relevant by bolting cloud features onto old-fashioned hardware. The approach is actually rather old and can be traced back several decades to the equally unworkable promise of tiering in Information Lifecycle Management (ILM).
The truth about cloud tiering is that it’s merely a clever marketing term — one that hides a fundamentally broken approach to managing capacity. What organizations are now beginning to understand is that a cloud solution which relies on tiering instead of syncing data will actually become less efficient at scale.
Syncing, Tiering & Cloud
Tiering is not a new concept in the storage world. This sort of planned, organized movement of data from the front end to the back end has been going on for decades. Today, that back-end target is the cloud. But the cloud itself has little in common with old-fashioned storage media. The technology we use to leverage unlimited, on-demand, low-cost capacity should be designed and optimized for this revolutionary new storage medium. Cloud demands a new approach, not a modified variation of one developed decades ago. Data should be synced to the cloud, not tiered.
The difference between tiering and syncing data to the cloud is the difference between a finite design and an infinite one, between something you have to babysit and a solution that just does the right thing for you. It’s the difference between an approach that is struggling to stay relevant in the era of cloud and one that was architected for the unlimited object store.
The goal of both approaches is the same. Tiering and syncing are both trying to overcome the capacity limitations of local hardware by leveraging cloud storage. The volumes on standard NAS devices can only grow to a certain size. Tiering to the cloud allows you to thin out that volume and utilize the unlimited, cost-effective capacity of the cloud. But it’s not an easy operation, nor is it an incremental or granular one.
The Operational Hazards of Cloud Tiering
There is too much overhead involved with building a tier and making sure everything actually gets to the back end, so tiering happens in large, bulky sets — a directory structure or a large volume of files. This is a major weakness from a data protection standpoint: If the front-end device melts before you tier, that data is probably gone.
What happens as you start to accumulate large datasets in the cloud? NetApp will tell you that you can stitch together all these different backend volumes behind one volume. Cloud tiering and namespace aggregation sound beautiful in a marketing solution brief, but it’s a huge burden on whoever is responsible for managing the system. You end up fabricating volumes, filling them up, then deciding whether to tier or create a new volume and aggregate on the back end.
Retrieving from the tier is very disruptive for applications, too. You might need to maintain different volumes for snapshots, servers, VMs and more because of sensitive apps. Ultimately, complexity increases with capacity, and when complexity scales with growth, that is the definition of bad technology.
The Simplicity of Syncing to the Cloud
Syncing to the cloud reverses this approach. Everything is always streamed to and maintained in the back — the cloud. The source of truth is the cloud and the system keeps the front hot through automated caching. Think of tiering as moving data from the front to the back with very large trucks. No delivery truck departs until it is full. A sync-based file storage system, on the other hand, is more like a high-speed conveyor belt. Data is continuously flowing to the cloud backend — to one unlimited cloud volume, not an ever-growing number of loosely aggregated datasets.
In the sync model, if data is needed at the edge, it flows back in real-time, immediately after the end user clicks on that file. This is a continuous, granular process that applies to everything from a large CAD model to a sub-file-level asset, and it’s all happening automatically. You don’t have to think about what you’re going to move to the tier.
The eviction or deletion of data from the edge device happens in a convenient way – and very, very quickly – because the data has already been committed to the source of truth. So, as an operator, you are allowed to run an infinite volume with the expectation that you’ll get uniform performance everywhere. You don’t have to decide what’s in the edge and what’s on the backend. The system does it for you. From an operational standpoint alone, this is a massive improvement.
The traditional storage vendors might call their tiering technology cloud, but it’s not. NetApp has done a great job spinning up their controllers in the cloud, but their systems are not architected for unlimited scale. They were designed and optimized to run on local machines. A system that is truly designed for the cloud has to be able to scale to infinity without increasing complexity or forcing you to build new volumes or orchestrate and babysit regular migrations of data from one tier to another. A true cloud system should do all of this for you, automatically, so you don’t have to worry about capacity or data protection at all.
The problems of files are problems of scale. If scale causes an increase in complexity, that complexity will eventually take the system down. Cloud-native file systems designed to sync data are engineered to scale without complexity and operate as efficiently with a single site as they do across a global enterprise.
The effect on companies might seem magical, but I can assure you that it’s merely advanced technology.