By Rob Mason on March 17, 2011
A while back we took a hard look at cloud storage pricing and all the gory details. One of the questions we got back from that post from one of our readers is if we were also going to take a real look at cloud storage gateways and tell our customers what their limitations are. That’s a great request (keep them coming!) and something we’re going to tackle in this post. Despite what you hear from the marketing efforts from many companies, products and solutions always have limitations. While the Apple iPhone is an amazing device that has changed my life in many ways, it’s not a great phone (sorry, I’m still on AT&T). Anyway, what are the limitations of cloud storage gateways? Here’s the list then we’ll drill into the specifics:
A Cloud Storage Gateway is not truly a SaaS solution
The cloud storage gateway vendors (us included) tout cloud storage as storage-as-a-service, but the reality is that by definition the gateway vendors have a gateway or device (virtual or physical) that needs to be installed at your site. These devices need to be installed and maintained and take up some local compute and storage needs and require bandwidth to send data to the cloud. There are alternatives.
You could use the cloud directly without a gateway. Then you’ll be faced with the greatly reduced performance (5x-10x slower than the gateway devices), poor security, no compression/de-duplication, lack of WAN optimization techniques, etc. Another option is a gateway hosted in the cloud – that is a NAS server running in the cloud, storing data in the cloud. These effectively stretch the NAS connection (CIFS, NFS, WebDAV) to the client from the cloud. These protocols were not meant to be used over long distances with a lot of latency. There are companies that are providing “accelerators” in front of these kinds of solutions, but this is like putting lipstick on a pig.
Other than the poor technical solution and results you’ll face in this case they also have a terrible security model. By definition your data is in the cloud and accessible via a standard protocol like CIFS or NFS. Neither is particularly secure and security experts cringe when you tell them you’re going to host a CIFS or NFS server on the public Internet. You can reduce this risk with site-to-site VPN solutions and the like – at a performance penalty and cost. So we believe that while not a true SaaS solution, the pros outweigh the cons in this case IF customers care about performance, cost and security. If access or lack of local infrastructure are paramount then the consumer/prosumer cloud storage solutions are likely a better fit.
In summary, you need to have gear at your site with cloud storage gateways. They’re not pure SaaS.
Many cloud storage gateway providers (us included again) assume that you have unlimited bandwidth. This obviously isn’t true and our customers have a very wide range of bandwidth available to them in their businesses. While we have some customers that have sent multiple terabytes to the cloud in a single week, many of our customers have relatively slow network connections. Take, for example, a local law firm here in the Boston area. They have two T1 lines available for their entire company. They have about 50% utilization on those lines today during business hours. A T1 can do 1.536 Mbps (192,000 bytes per second) without overhead, but there is always protocol and application overhead that further limits the T1. For simplicity, let’s assume this law firm has one T1 dedicated to the cloud storage gateway and that they have an initial data set of 3TB to load into the cloud and they grow at 10% per year. Doing some quick math tells us that it will take more than a year to load the 3TB of data to the cloud. After that their available bandwidth is sufficient to keep up with the ongoing data growth.
So if you want to get to the cloud to reap the benefits of cloud storage and storage-as-a-service you could end up waiting a very long time with these different providers. This math and dynamic are pure physics. There are some things the gateways do that help this including compression, de-duplication, WAN optimization and the like but in the end data will need to be sent to the cloud and while the time can be reduced, sometimes dramatically, often you’ll be facing the limitations of your internet connection.
So this is a real limitation. This is also something that we’re helping our customers with through a bulk data migration service which allows us to load the customer’s data into the cloud without using the customer’s bandwidth. Doing that safely, securely and quickly is a topic for another post.
Caching vs. Mirroring
The advanced (true) cloud storage gateways cache data. That means your entire data set that you’ve stored in the cloud is not entirely resident locally. This is one of the true promises of cloud storage in the ability to have infinite capacity provided by someone else without needing to also grow your local data footprint. But caching comes at a price. First and foremost, there’s the chance that the data you’re requesting isn’t in cache at the time you request it. For the most part, this just means you’ll have to wait a little longer for the data to be retrieved from the cloud. Worst case you could get some errors and timeouts (for a further discussion on this please see my recent post on Blocks vs. Files). So there are times when you will not get local performance by definition. Caches are not perfect and sometimes you will get misses. How often depends a lot on things like your usage pattern, cache size, etc. But some depends on how good the caching algorithms are as well as the level at which the caching algorithms can make the decisions (blocks vs. files etc.).
The second issue with caching versus mirroring is also related to cache misses but one that is rarely discussed – that’s a cache miss on writes. This essentially means the space you need to write to in the cache is temporarily full and has not been freed up by offloading/sending data to the cloud. Regardless of what vendors say, this can and will happen to you. You can avoid/postpone this situation by having a very large cache and making sure your Internet connection performance and duty cycle allow your appliance to catch up during off hours. But the reality is that your ability to feed data to the appliance most likely is significantly faster than your Internet connection bandwidth. So cache full errors can occur from this impedance mismatch. This is exacerbated and often encountered by the bulk data migrations I covered earlier where a large amount of data is being loaded as fast as possible over an extended period. Many tools like rsync will break on the first “disk full” error.
So how you can avoid this issue? Size your cache correctly. Be cognizant of your available bandwidth and duty cycle. Use an appliance that has WAN optimization features like compression, de-duplication etc. Use appliances like Nasuni’s data migration service that are cache-savvy and can handle the cache full situation. Our migration service has successfully migrated many terabytes of data to the cloud through a 64GB cache in a single run. Sure you could grow your cache, but if it’s for the one time need of data migration, why waste the space?
Inability to access data without the gateway
Business-class cloud storage gateways change the data written to them as it’s sent to the cloud. The data is WAN optimized (compressed, de-duplicated), it’s secured (encrypted at rest as well as on the wire), it’s chunked for parallelism/efficiencies, etc. These attributes that enable fast snapshots, high security, fast access, etc. mean that the data is not in the same format in the cloud as it was when you wrote the data to the appliance. This is one of those cases where you can’t have your cake and eat it too. If you want good security including encryption such that the vendors can’t read your data and you want advanced WAN optimization then your data format will be changed.
If you want direct online access to your data at the CSP and homogenous access regardless of whether you use the gateway or not then you want a gateway that does not modify or alter your data as it is sent to the cloud. That means your data will be unencrypted in the cloud and available to prying eyes, accidental sharing, etc. Some providers try to work around this by providing an online access gateway that is the equivalent of running a gateway in the cloud to access data in the cloud. In this case the data is not in its native format and you’ve just got the appliance closer to the data doing the transformation back to the normal format. The issue with this approach is that if the data can be decrypted by something running in the cloud it’s naturally less secure than something running under your control in your own data center. So this is a limitation of cloud storage gateways. You can’t get at your data without the appliance because the appliance adds a level of security.
It’s a multi-vendor solution
Most gateways are provided by vendors that are not also cloud storage providers. There are exceptions to this rule with Jungledisk provided by Rackspace, the gateway provided by Egnyte, to name a few. The CSP-provided gateways are more limited in that they usually only support the CSP that provides them (Jungledisk is the exception supporting Amazon S3 and Rackspace Cloud Files) and are more limited in functionality. The functionality limitation is mostly because it’s not the core focus of the company and they have not set out and invested in the kind of technology needed to create a first class gateway that meets customers’ needs. They often require higher than average technical skills, appeal to a more technical audience, have limited functionality, weak documentation, and unclear support. The advantage of getting a gateway and CSP from the same company is clear – one throat to choke when things go wrong and (usually) a single invoice.
If you get your storage and gateway from different companies you’re often faced with a charge for the gateway from one vendor and a charge for the cloud storage from another and confusion when things go wrong. Nasuni has taken a unique approach in this area in providing a single invoice for both our gateway and whatever cloud storage vendors you may choose to use with our product. We’re the single point of contact and we take responsibility for the entire solution. That’s unusual in this space, but even with that the fact is there are multiple vendors involved in the solution and at times things happen to the CSPs that are outside our control. When a CSP has an outage the cache can help hide this from your users, but larger events can happen. CSPs can go out of business. When that happens we can help migrate you from one vendor to another with our migration service built into the product or with professional services. We can’t affect the CSPs but we can protect our customers.
I hope this helps you understand the limitations of cloud storage gateways. The reason we’re not hiding gateways weaknesses is that we want our customers to take a real look at the benefits and limitations of cloud storage gateways and to make an educated decision. We obviously believe the benefits far outweigh the limitations but that’s for you to decide. Interested in learning more about Nasuni? Schedule an online demo today!
The cloud storage field is crowded and the terminology is constantly evolving. Here’s how Nasuni defines the 10 most common cloud storage buzzwords.
Share this blog post:
Rob Mason has more than 20 years of operational, management and software development experience, all of it in storage. A meticulous builder and obsessive tester, with an eye for talented engineers, Rob produces rock-solid software, and, through his own example of hard work and ingenuity, inspires his teams to outdo themselves. His determination for thoroughness extends to financial and operational matters, and at Nasuni, he is a powerhouse behind the scenes, managing the company’s operations, in addition to its engineering team. As the VP of Engineering at Archivas from 2004 to acquisition, Rob oversaw all development and quality assurance. After the Hitachi acquisition, he continued in his role, as VP of HCAP Engineering, managing the integration of his team with Hitachi’s and supporting the rollout of HCAP. Before joining Archivas, he was a senior manager at storage giant EMC, where he was responsible for the API, support applications and partner development for EMC’s content-addressed storage product, Centera. In a previous stint at EMC, he was Manager and Principal Design Engineer for the elite Symmetrix Group, where he improved the speed and reliability of EMC’s flagship enterprise storage disk array. Between Centera and Symmetrix, Rob was the co-founder and VP of engineering at I/O Integrity, a storage-based startup developing a high-performance caching appliance. He has a bachelor of science from Rensselaer Polytechnic Institute and a master’s in business administration with honors from Rutgers University. Rob holds upwards of 30 patents.