Scale: Transforming Unstructured Data into Big Data with the Nasuni Analytics Connector

I was recently asked by someone if I had to sum up everything we do at Nasuni in one word, what would it be?

Scale.

Everything we do is about scale. It is scale that makes it hard to store a file system comprised of hundreds of terabytes of files and it is scale that makes that same file system hard to backup. It is the scale of an organization distributed across tens or hundreds of locations that turns access or simple management operations into an ordeal. And it is scale in terms of the sheer number of files that makes any sort of analysis of file system data very slow and very difficult.

The cloud was built to absorb scale. For years, our clients have benefited from the fact that UniFS, our Global File System, is cloud-native and can scale to any number of files, any capacity, and span the globe across any number of locations. In doing so, though, we also ended up creating a file system that logically centralizes all unstructured data in the cloud.

This logical centralization of file data has many advantages. It allows companies to reduce their costs, shrink their local storage hardware footprint, share files between locations, and so much more. But there is one benefit that we have not truly explored.

Until now. 

Earlier this year, I wrote about how organizations are clamoring for increased access to data from more cloud services. They want to be able to apply artificial intelligence and machine learning tools to gain greater insights into their data. By that I mean all their data – including the files so often siloed on traditional NAS storage hardware spread around the globe. By consolidating and centralizing all of this file data in cloud storage, Nasuni makes this a real possibility. And with our new Analytics Connector, which will be available in Version 8.7, we’re transforming it into reality.

We are turning unstructured data into big data.

Applying Business Intelligence to File Data

The millions and even billions of images, videos, documents, and other files generated by users and applications across large enterprises are hiding potentially valuable information. Enterprises have long been eager to use the cloud-native Business Intelligence and Analytics tools provided by AWS, Microsoft Azure, and Google Cloud Storage on this file data. They want to meta-tag their image repositories to make them easier to search. Engineering firms want to be able to identify patterns that can help prevent budget overruns on projects. The wish lists go on for pages.

The tools exist. They are available in the cloud. They’re scalable and powerful. But they are not going to work if your file data is distributed across multiple NAS devices and data centers around the world. Nasuni centralizes enterprise file data, and the Analytics Connector makes this data available to a range of cloud services and tools. It effectively opens the world of Enterprise NAS to Big Data analytics. Our clients will now be able to take advantage of:

  • Search: Media indexing and search services will be able to analyze files exported by the Analytics Connector and automatically identify objects, people, text, scenes, and activities in images and video. Content managers will be able to search on the people, locations, and other information stored by Nasuni in cloud storage. One customer has already used a prototype to help their design teams search millions of images for creative inspiration on new projects.
  • Analytics: Query services such as Amazon Athena and Azure Data Lake will be able to analyze data in object stores using standard SQL without first loading the data into a database. Business analysts will be able to run queries against CSV or IoT data stored by Nasuni. This will no longer be dark data — suddenly businesses will be able to mine it for valuable insights.
  • Compliance: Amazon Macie uses machine learning to automatically discover, classify, and protect sensitive data. With the Analytics Connector, compliance officers will be able to use a service like this to identify Personally Identifiable Information (PII) and Intellectual Property (IP) in their existing Nasuni-managed file data. This will be tremendously valuable to companies addressing GDPR needs, among other uses.

How it Works

These cloud services can’t interface with standard Nasuni volumes in the object store because Nasuni encrypts and obfuscates files as it stores them as objects in the cloud. This is a critical piece of our security model, and it reduces the costs of storing files in the cloud, too. So we weren’t about to change the way that works. Instead, the Analytics Connector creates a temporary copy of file data in native object format, then writes this to a separate cloud storage account.

A few of the advantages of our implementation include:

  • Flexibility: Customers will be able to use any analytics service from AWS or Azure, regardless of which cloud currently stores their Nasuni volume. Optionality is critical.
  • Speed: Since file data has already been centralized in cloud storage, the process is fast, capable of exporting 14–16 TBs of data per hour.
  • Control: Customers will be able to specify file types, specific paths, and more to refine the selection of data for analysis.
  • Cost: The Analytics Connector will be a free feature of the Nasuni platform as of release 8.7. Nasuni will provide a cost estimator tool to help organizations project the cloud costs of storing the selected data sets in native object format in a separate cloud storage account.
  • Security: Of course, security is also paramount, as it always has been here at Nasuni. The Analytics Connector runs entirely in the chosen cloud storage account, using securely stored customer keys.

The Analytics Connector reflects the design philosophy at Nasuni. First of all, it is built for scale. It can perform lightning fast scans of massive UniFS instances. Secondly, it is an open architecture. Rather than trying to bundle analytics into the Nasuni offering, we are enabling our clients to connect – hence the name – into the rich ecosystem that is already in place the premiere cloud providers: AWS and Azure. The files belong to our clients and we want them to be able to leverage best-in-class modern tools to gain greater insights from their data. We want to do everything we can here at Nasuni to help organizations do more with their file data.

Leave a Reply