What happens when the cloud goes down?
If you’ve been following Nasuni for any time, you may have seen this question before. It is the title of an excellent blog post by one of our senior engineers. In it, Jesse Noller discusses what happens when Amazon (one of our cloud storage backends) goes “down.” Specifically he discusses the several components of the Amazon architecture and how Nasuni uses them. Given the recent outage of Elastic Block Storage, EBS, in Amazon’s Northern Virginia location on October 22nd, 2012 it seems this discussion bears repeating.
What are the different aspects of Amazon?
Amazon cloud consists of many different infrastructure services, but three are most common and most frequently discussed.
Elastic Cloud Compute (EC2)
Amazon’s EC2 system are the compute nodes (virtual machines) used for running applications and processes. If you want to host a webserver, run an online application, or offer an API server this is the compute layer you will most likely use.
Elastic Block Storage (EBS)
EBS is block storage used by EC2 nodes. Basically this is the block storage used by the virtual computers so that they can store things which can be retrieved and used later.
Simple Storage Service (S3)
S3 is the core storage offering of Amazon. It was actually the very first service they offered publicly and has been running the longest. It is the only component of Amazon Web Services (AWS) used by Nasuni. According to public records (and our own internal testing) it is also the most stable of all of the Amazon solutions. When S3 is unavailable it often tends to be for mere minutes or seconds at a time from certain locations and geographies – not from everywhere at once. Want to know more? We’ve done a lot of testing on this component – and you can find out more in our annual reports at Nasuni.com.
We should note here that EBS is actually built on top of S3. Why does S3’s core consistency and availability not transfer to EBS? S3 is the most basic component of the cloud. It was the first system built, the most reliable of all of the other systems, and the closest to a raw component – disk drives. EC2 and EBS are a lot of software built on top of storage and compute systems to offer additional services. All of this additional software added to raw components increases complexity and increases the likelihood for failure. While certainly useful, EBS has a higher likelihood of failure and any system leveraging it should take that risk into account and build solutions to eliminate those risks.
How does Nasuni use these services?
Nasuni only uses S3. We do not use EC2 or EBS because they have historically (and publicly) been proven to have reliability issues. At Nasuni, we use the cloud as a component of a storage solution. For this reason, we use the parts of the cloud that are closest to core infrastructure and offer the least amount of “add-on” services.
S3 of course has its own outages as well. Over a three year test period Nasuni found that on average S3 experienced approximately 1.4 outages per month. This intermittent lack of availability lasts somewhere between a few seconds to a few minutes and often only affects a subset of customers. For this reason they are not publicized like the EC2 and EBS outages. Nasuni knows about them because we test Amazon every five minutes from our servers and on top of that, Nasuni Filers report back what they experience from 30 countries around the world.
Nasuni customers did not experience these outages – not one. Thanks to effective caching and smart algorithms our customers were using local storage at the few times when these outages occurred. It’s this type of dual redundancy that allows us to offer our 100% SLA. (If you want to know more about how our on-premises hardware protects customers from these outages, check out our ‘How it Works’ section of Nasuni.com)
Which brings us back to the original question: What happens when the cloud goes down? But now that we know more about the aspects of Amazon Web Services and how Nasuni uses them, we can ask a better question:
What happens to Nasuni customers when Amazon has a public outage?
When Amazon has a publicized outage it’s because EC2 or EBS is experiencing a public and painful outage. Since Nasuni doesn’t use either one of these services, neither Nasuni nor our customers are impacted in any way. As far as S3, Amazon has infrequent but intermittent downtime that Nasuni’s local appliance manages automatically for our customers so that they experience no change in service.
Short answer? Nothing. We spent years building the system so we could say that.