By Rob Mason on February 21, 2012
We’re never happy when one of our customers goes through a disaster or work disruption of any kind. Several months ago we heard of another event where our Filer’s built-in backup and disaster recovery saved one of our customer’s data.
A storage administrator at one of our enterprise accounts was working on one of their VMware ESX virtualization platforms and made a potentially catastrophic mistake. One of the great advantages of virtualization is the ability to create new machines on demand. IT often does this for testing new technologies, staging new deployments, experimentation, quality assurance testing, development testing and a number of other purposes. Over time they quickly end up with what I’ll call “virtual machine bloat.” One of the big drivers of the success of EqualLogic (which we have an investor in common with) was the growth of virtualization and the need for storage that it drags with it. iSCSI and the simplicity of EqualLogic eased the demand of all these machines for capacity. Still, capacity isn’t cheap and IT constantly has to monitor the virtual machines they’re running and clean up machines and data stores to maintain available space.
In this case the administrator was removing unused machines that were used in prior testing and then removing the underlying data stores. In the physical world, it’s much more difficult to make a mistake along these lines – you’re moving physical hardware, changing cables, jumping between major interfaces, etc. But in the virtual world you can delete very large amounts of data and whole machines with a single click.
In this case the administrator thought the data store on his ESX platform was no longer in use and right-clicked on the data store and chose “delete.” Like all good applications it popped up a warning asking him if he was sure. We’re all so used to these pop-ups that we’re trained to automatically click the “yes, I’m sure” button. This is the time when you wish the vendor didn’t trust your answer and popped up some really nasty warning and asked you to type “I want to delete all the data on this data store” or something along those lines. In other words, get your attention during a busy day – you’re about to do something potentially very dangerous.
Anyway, in this case there was no extra pop-up and he clicked “yes, I’m sure” and the data store was deleted permanently. Unfortunately the Nasuni Filer was running on the data store at the time and immediately began to log errors related to the unavailability of its disks, etc. We have a lot of great technology in the Filer, but in the end it’s an appliance and needs resources (albeit virtual) to run with.
After he realized what he had done (and a long walk to calm down and think) he recalled Nasuni’s disaster recovery capabilities. The Nasuni Filer still had its default setting to take a snapshot to the cloud every hour. This means he could do a DR and get back to within the last hour’s worth of work.
He shut down the broken Filer and downloaded a fresh copy of the Nasuni Filer software. He then created a new data store to host it. Next he began the installation/disaster recovery process. This takes about 15 minutes and he was back up and running quickly with all of his data up to his last hourly snapshot available to his users. The cache performed its function well and made it appear as if all the data was available to the users immediately.
His users noticed the outage for the period from when he deleted the data store until he completed the recovery. This was less than an hour total, but otherwise they had no idea that all their data was deleted from their office. A few complained of missing files that were not there at the time of the last snapshot, but the total loss was greatly reduced from a total loss.
DR and backup aren’t just for when your building falls down or your machines catch fire. There are other times when they’re critical to your success in managing your data center. We’re glad the filer can help in these cases and we’d love to hear more stories on how our automatic backup or disaster recovery has saved or helped you perform your job. We’ll even keep your name out if the story is embarrassing 🙂
Rob Mason has more than 20 years of operational, management and software development experience, all of it in storage. A meticulous builder and obsessive tester, with an eye for talented engineers, Rob produces rock-solid software, and, through his own example of hard work and ingenuity, inspires his teams to outdo themselves. His determination for thoroughness extends to financial and operational matters, and at Nasuni, he is a powerhouse behind the scenes, managing the company’s operations, in addition to its engineering team. As the VP of Engineering at Archivas from 2004 to acquisition, Rob oversaw all development and quality assurance. After the Hitachi acquisition, he continued in his role, as VP of HCAP Engineering, managing the integration of his team with Hitachi’s and supporting the rollout of HCAP. Before joining Archivas, he was a senior manager at storage giant EMC, where he was responsible for the API, support applications and partner development for EMC’s content-addressed storage product, Centera. In a previous stint at EMC, he was Manager and Principal Design Engineer for the elite Symmetrix Group, where he improved the speed and reliability of EMC’s flagship enterprise storage disk array. Between Centera and Symmetrix, Rob was the co-founder and VP of engineering at I/O Integrity, a storage-based startup developing a high-performance caching appliance. He has a bachelor of science from Rensselaer Polytechnic Institute and a master’s in business administration with honors from Rutgers University. Rob holds upwards of 30 patents.