The Emerging Objects Versus Files Debate
A few weeks ago I spent an hour with a new analyst to the storage space over at TechTarget and one of the topics we covered was object versus file based storage systems. Wikipedia has a workable definition of object based storage:
An Object-based Storage Device (OSD) is a computer storage device, similar to disk storage but working at a higher level. Instead of providing a block-oriented interface that reads and writes fixed sized blocks of data, an OSD organizes data into flexible-sized data containers, called objects.
But wait a minute. If we substitute “object” with “file” in that definition don’t we end up with a workable definition of file-based storage? What’s going on here?
There are a few things that have been going on in the storage industry for more than a decade. First, new storage vendors continue to come along and look to differentiate their offerings with new terminology and positioning. You hear statements to the effect that objects are “better” because they’re encapsulated and self-contained. Also that an object represents not only the data but the metadata too - giving it much more potential power. But files also have metadata. Anything from the file name, to its permissions, access control lists, extended attributes, etc. With metadata on files you can implement all the things claimed to make objects unique by themselves.
The next challenge that the storage industry has been facing that the file-based interfaces put a lot of stress on various different systems. File systems are inherently organized into directories and the clients and users then manipulate those directories and files. This all works well when there are a limited number of files and directories. But what happens when you have petabytes of data or millions of files in a single directory? The clients crash, go slow etc. The truth is that the applications and the way they use the file systems have not evolved as quickly as the explosion of data. Where it used to be acceptable for an application to put all its working files in a single directory, the same application today may be causing a lot of stress. The other part of the interface to files that has caused stress are the network attached storage (NAS) interfaces like NFS and SMB. While improvements have been made in both, they both continue to lag behind explosive data growth and file sizes.
These stresses have led vendors to use different interfaces for object-based storage -- anything from the insanely complex proprietary APIs of the EMC Centera to the simple REST based APIs of many of today’s public cloud storage providers. With these APIs can come relief -- object naming can be performed for you (ATMOS), or it can be irrelevant (Amazon S3), the protocol can be more efficient (HTTPS vs SMB), and some finicky requirements like multiple client consistency can be ignored. The problem is that the users and clients that have been using the standard protocols for years can no longer access the storage and you need custom integrations by the customers or a stellar ISV program by the vendor (EMC) to get this approach to have a shot. Don’t get us wrong, we don’t think Amazon S3 should have a SMB interface, but we are saying that just because it has a HTTPS interface on it and not a SMB interface doesn’t make it blocks versus files.
The industry has already proven that claimed object-based storage systems like EMC’s Centera and the Archivas/Hitachi HCA are capable of having file based interfaces like NFS and SMB in front of them. It has also proven that file-based storage systems like NetApp and Overland are capable of having HTTPS interfaces in front of them. Traditional file-based storage vendors like NetApp have also proven to be adept at adding higher level concepts (like retention for compliance purposes) on top of their existing storage systems. Their greatest challenge here is not whether files can contain the metadata necessary to implement the concepts needed, but the interfaces used to access those files.
So my advice to the analyst was not to get caught up in the whole files versus objects debate as its more marketing and positioning than anything substantial. What the industry should be focusing on are the capabilities of the storage systems and how they fit customer’s intended use cases.
