Download the printable version of Nasuni Technology White Paper.
High Performance Cloud Storage Through Caching
UniFS™ maintains local-speed through its substantial cache, high-performance algorithms, chunking, and file compression techniques. We use a write-back model for caching: All the writes are instantiated locally. Users never read through from the cloud and never write directly through to the cloud.
LRU Algorithm
Cache Manager uses an enhanced version of the popular least recently used (LRU) algorithm. The Filer keeps a list of all the files recently accessed by clients. The most recently used files remain in the cache. Objects written once and never used again will eventually be evicted from the cache, and deleted from the local disk, to make room for new material. If one of these files is later read, UniFS™ goes to the cloud, retrieves the data, and reinstantiates it in the local file system.
This is a standard LRU approach, but we also added several levels of complexity to the algorithm to improve its cache hit rate.
Caching Time Periods
Because UniFS™ is a versioned file system, customers can look at both the present and the past. To ensure that more recent data maintains priority, we have adjusted the algorithm so that it is more aggressive in throwing out past cached data relative to current cached data.
Caching Metadata
UniFS™ is effectively an infinite file system with a finite cache. As a result, it is conceivable that a customer could get to the point where the cache would not even be able to fit all the metadata, let alone data. So, UniFS™ caches metadata, too. This enables us to represent a petabyte-scale system with a tiny cache and quickly respond to requests for data that have not been accessed in a while—a 5 MB music file that has not been used in a week might be sent to the cloud, but its metadata will remain in the cache.
Chunking and Compression
Before heading to the cloud, that 5 MB file will be chunked into pieces. Each of those will be compressed and encrypted. Chunking and compression accelerate the retrieval process. If a user calls up a large file and File Manager cannot find the file in the cache, it initially forwards a request to Cache Manager for a specific piece of that file. Requests for the remaining pieces follow sequentially. This way, the latency experienced by the user is not the time it takes to pull all of the data associated with that file back from the cloud and reinstantiate every byte in the cache. The user waits only for the first few bytes of data, which arrive quickly. UniFS™ starts streaming the first chunk immediately, then reads the rest. Once all of the file’s chunks have been retrieved and reinstantiated in the cache, the user is reading and writing to the file at local speeds.
Intelligent Predictions
The system monitors access patterns so that it can pre-fetch data and metadata. In other words, it looks for patterns that suggest what users might want next, and brings that data into the cache pro-actively, so it is there when needed, and does not have to be called back from the cloud. At the same time, the system measures its success with these operations and adjusts the algorithms on the fly so that it is constantly improving the cache hit-rate.
Optimization
Though we have done extensive research to determine the optimal settings for our users, certain aspects of the appliance will be adjustable so that customers can tune the Filer to their particular needs.
Push Periods
The push period is user-defined. IT Administrators can change it to suit their particular needs.
Heavy Traffic
If a customer’s users are prone to hogging bandwidth at a given time, IT Administrators can delay the push during that period to ensure normal network speeds. The Filer can also be set so that it does not push at all during the day, but moves all new or changed data only at night, when network usage is low.
Bandwidth Usage
IT Administrators can also throttle back the Filer's allowed bandwidth. So, instead of letting the Filer dominate the network while it pushes new data, Administrators can cap usage to, say, 50 kb/s. This way, data will be slowly pushed to the cloud, with no impact on network speeds.
Feedback and Alerts
Though Nasuni is removed from the I/O path, the appliance does call home to our servers nightly with anonymized performance data. Nasuni will aggregate data—what types of files are being written, their sizes, etc—from customers across our entire user base. We will then use this data to optimize future versions of the Filer.
The appliance will also send alerts, via email or by posting them on the GUI, if users are writing data faster than the cache can push it to the cloud, or if the cache is constantly filling up. IT Administrators can easily increase the allotted space through the VMware configuration tool.
Any errors in the system will spur the appliance to call home and provide Nasuni with the relevant information. Nasuni can then suggest a fix or explain what went wrong.
Previous: Security
Back to Gateway to Cloud Storage Technology
Next: Summary