On Friday night Azure’s Blob Storage became unavailable for thousands of customers. Service was largely restored within 12 hours, but there was quite a lot of noise on Twitter and blogs from frustrated users. Isn’t cloud storage supposed to be available? Isn’t uptime part of why you pay for the cloud?
Cloud storage outages occur from time to time – and no one Cloud Storage Provider (CSP) is immune. Amazon has had several public outages over the last few years. These outages impact anyone using the services that go down – unless cloud storage is protected behind a local storage caching system. Nasuni’s service combines an on-premises local storage controller with cloud storage – where the on-premises storage acts as a cache. This ensures that work continues as normal, uninterrupted, even if a CSP has a temporary outage.
Friday’s Microsoft outage was unique – it was not an “outage” per se. What actually happened was that Microsoft let their Azure SSL certificate expire. As a result, no one could communicate with Azure via HTTPS. Most applications and companies using cloud storage access it via HTTPS for the added security of an encrypted tunnel.
Nasuni noticed the problem almost immediately – our appliances in the field quickly began reporting an error back to our NOC. Our engineers did a little digging and found the expired certificate – something we alerted Azure about as soon as we found it. Knowing this fix was going to take some time (not something Microsoft could fix in minutes or a couple of hours), Nasuni engineers took the proactive step of developing a work-around for our production customers.
This solution did not impact the encryption standards used by Nasuni and was only necessary in the rare case that a customer was trying to access data not already stored in the local cache. While not all of our customers used this temporary fix, combined with local caching, it ensured that everyone using Azure could continue to go about business as usual.
Nasuni’s intelligent caching technology significantly mitigates the risk of downtime and because we manage end-to-end solution we have the ability to proactively address any issues that may arise, no matter how unique.