Testing the Cloud Pt. 1: Reliability & Availability - Nasuni
Blog

Testing the Cloud Part I: Reliability and Availability

The Nasuni Filer delivers local-like speed through high-performance caching algorithms, chunking, compression, and several other techniques.  Yet the Filer’s performance also depends in part on the availability and reliability of our cloud storage partners.  If a provider goes down, and your Filer attempts to retrieve data stored in that cloud, your users may experience a delay.

Because we depend on the clouds, we developed ways to monitor them.  Through our Cloud APIs, we created two tools to check cloud availability, reliability, and performance.  Cloudbench covers performance, and we will explain how that works in a forthcoming post.  Here, we will discuss Cloudping.

Cloudping operates from two different locations, simulating a distributed customer base, and monitors every vendor we support, including several that we have not publicly announced yet.  Every few minutes, each of these systems writes, reads back, compares, and deletes small pieces of data to or from each one of those cloud vendors.  We ensure that each cloud is operational, but these tests are more than simple network pings.  We also check for data integrity on the read-back, verifying full API and credential functionality.

If we fail to connect with the cloud, the system makes several more attempts.  Should the system continue to fail, we know that it is a real outage, and not a simple glitch, and we inform our support team and the provider.  Running this test from two different locations against multiple providers allows us to narrow the problem.  If both spots experience the same problem, or we identify an issue with only one vendor, then the glitch is most likely originating with that vendor, not Nasuni.

Naturally we also keep track of all this data.  When a cloud has an outage—and they all do—we monitor how long, on average, it remains down.  We also report back to vendors on their average response time and how it stacks up to their competitors.

 

Cloud Ping Data

On the whole, our cloud partners have been responsive to these tests.  For example, we recently found that average ping times from one of our locations to one of our vendors had jumped from 20 ms to 200 ms.  After looking into the problem, the vendor discovered that it had a routing issue between one of our sites and one of theirs.  And they fixed it.

Nasuni is not a cloud monitoring company.  Our business is the Nasuni Filer.  Yet we also want to ensure that our customers get the best possible experience.  As a result, we built tools like Cloudping and Cloudbench so that we can be proactive and identify glitches before our customers experience any lags or outages.  This way, we make sure problems are fixed before our customers even know they exist.

Leave a Reply